Myria: Making Strides in Big Data Management as a Service | Intel …

Myria: Making Strides in Big Data Management as a Service | Intel …

By Magdalena  Balazinska,  University of Washington

At the recent SIGMOD 2014 conference, we demonstrated the Myria Big Data Management and Analytics Service.

Myria is a cloud service developed and operated by the University of Washington Database Group and eScience Institute. The Myria design meets requirements from real users and complex workflows, especially in science. Myria provides a unique combination of features:

Myria is offered directly as a service: No software installation, no cluster deployment, no virtual machines to worry about.
Myria is highly expressive: It understands SQL and Datalog, and also provides a new hybrid declarative/imperative language, MyriaL, that enables the expression of sophisticated analytics using primarily declarative constructs.
Myria has an efficient back end: Myria is a parallel, shared-nothing system. Its core design is a combination of state-of-the-art approaches and new algorithms for the optimization and execution of large-scale, possibly iterative queries.

The Myria service is currently deployed in the private cloud of the University of Washington database group. Users access Myria through their Web browsers. They simply point their browsers at the service and they can start to analyze their data.

Users access Myria through their Web browsers. They simply point their browsers at the Myria service and can start analyzing their data. Double-click anywhere in the image for a larger view. (Source: Magdalena Balazinska, University of Washington.)

Myria is a layered system (see figure 1 below).

The back end is a parallel, shared-nothing query processing engine called MyriaX. MyriaX is a relational engine. Its query plans can be either tree-shaped or they can contain loops. In the case of iterative queries (loops), MyriaX can execute them synchronously or asynchronously.  MyriaX uses PostgreSQL as its main storage subsystem but it can also read data from HDFS, S3, or the local file system on each machine.
The MyriaX coordinator exposes a RESTful interface to layers higher up in the Myria stack. We use that interface to build applications directly on top of MyriaX.
Most commonly, though, applications do not communicate directly with MyriaX. Instead, they talk to MyriaQ, the web-based query compiler and optimizer. MyriaQ accepts queries written in Datalog, SQL, or our new language MyriaL. It parses and optimizes them. It then outputs MyriaX query plans for execution. MyriaQ can also generate plans for different back ends. We will present MyriaQ and the MyriaL language in more detail in a separate post.

Figure 1: Myria has a layered system architecture comprising a back end (a parallel, shared-nothing query processing engine); a coordinator that exposes a RESTful interface to layers higher up in the Myria stack (enabling applications to be built directly on the back end); and a web-based query compiler and optimizer. (Source: Magdalena Balazinska, University of Washington.)

The SIGMOD demonstration showed the fundamental Myria capabilities related to (1) using Myria as a service, (2) expressing various types of analysis in MyriaL, (3) efficiently executing them on the MyriaX back end, and (4) building specialized applications on top of Myria.  The demonstration also focused on several of our research projects. We showed:

The MyriaL query language.
New multi-way join algorithms for efficient parallel query processing in shared-nothing clusters.
A study of the interactions between iterative query execution and different types of failure-handling methods.
New, powerful query visualization and debugging tools.
A new type of Service Level Agreement for cloud data services.
A specialized service called MyMergerTree that we built on top of Myria to enable the exploration of galactic merger trees.

More details about these features, including videos of the individual demos, are available on our project website.

In this blog post, we expand briefly one specific feature that we demonstrated: Personalized Service Level Agreements.

An important goal of the Myria project is to study the challenges related to offering a data management and analytics system as a cloud service. One critical challenge of today’s data management services is their pricing models: Users need to pick the amount of resources (e.g., service instances and instance sizes) they want to pay for, which requires extensive prior experience to translate resources into resulting query performance..

A new feature, the Personalized Service Level Agreement (PSLA), helps users easily translate resource requirements into resulting query performance, so they know exactly how much data management and analytic capacity to buy.

To address this challenge, we developed an abstraction called a Personalized Service Level Agreement (PSLA), which provides users with a selection of service tiers expressed in terms of the price to pay in order to achieve different levels of performance for queries over the data submitted by the user. The key idea is that the user uploads his or her data (actually, users only need to upload the schema and statistics about their data, such as the table cardinalities).  The system then generates a PSLA specialized for the user data. The PSLA shows templates for queries that can be executed on the user’s data.  Templates are grouped into clusters. Each cluster is associated with an expected run time. This is the expected time for the evaluation of any query that follows the templates in the cluster. Groups of template clusters form tiers of service. Each tier comes with a fixed, hourly price and corresponds to one configuration of the Myria service. The following figure illustrates what a PSLA looks like.

A Personalized Service Level Agreement (PSLA) in Myria.  (Source: Magdalena Balazinska, University of Washington.)

The SIGMOD ’14 demonstration shows the PSLAs that Myria generates for an example dataset from the astronomy domain. The video of the demonstration is available on our project website.

References:

Myria project website

Paper: Daniel Halperin, Victor Teixeira de Almeida, Lee Lee Choo, Shumo Chu, Paraschos Koutris, Dominik Moritz, Jennifer Ortiz, Vaspol Ruamviboonsuk, Jingjing Wang, Andrew Whitaker, Shengliang Xu, Magdalena Balazinska, Bill Howe, Dan Suciu: “Demonstration of the Myria Big Data Management Service.” SIGMOD Conference 2014: 881-884

Originally from: 

Myria: Making Strides in Big Data Management as a Service | Intel …

Share this post