Databricks ropes in Alteryx to push Spark adoption for big data …

ByEric Blattberg alteryx, article, case, cloudera, data, george-mathew, hadoop, mathew, officer, open, spark Comments Off

<![CDATA[.post-boilerplate
margin-bottom: 2em;
]]>

Databricks thinks the open-source Spark engine is the next big thing for big data processing — so it has teamed up with analytics firm Alteryx to supercharge the software.

The two data startups intend to drive Spark into the hands of more data analysts through a formal partnership, Databricks and Alteryx have revealed to VentureBeat. They will become the primary committers to Apache Spark, the open-source, in-memory engine often seen as the leading candidate to replace MapReduce, the companies said.

MapReduce, originally conceived at Google, is the initial programming model for the Hadoop ecosystem of open-source tools for analyzing lots of different kinds of data. But while MapReduce boasts strong scalability, fault tolerance, and throughput, it generally runs jobs on a batch basis. That is quite limiting in terms of latency and accessibility, argued Alteryx chief operating officer George Mathew in a conversation with VentureBeat.

You need a custom MapReduce programmer every time you want to get something out of Hadoop, but that’s not the case for Spark, said Mathew. Alteryx is working toward a standardized Spark interface for asking questions directly against data sets, which broadens Spark’s accessibility from hundreds of thousands of data scientists to millions of data analysts — folks who know who to write SQL queries and model data effectively, but aren’t experts in writing MapReduce programming jobs in Java.

The Spark framework is well equipped to handle those queries, as it exploits the memory spread across all of the servers in a cluster. That means it can run analytics models at blazing-fast speeds compared to MapReduce: Programs can go as much as 100 times faster in memory or 10 times faster on disk. Those performance enhancements — and the subsequent customer demand – has prompted Hadoop distribution vendors like Cloudera and MapR to support Spark.

Databricks, founded by the creators of Spark, today announced $33 million in new funding, bringing its total venture financing to $47 million. It also revealed a new service for running Spark jobs and visualizing data on a Databricks-owned cloud. That’s another move by Databricks to make Spark as accessible as possible, a goal the Alteryx partnership will help push forward.

“We want to create a whole new generation of data blenders and analytics modelers that were never able to touch this stuff before,” Mathew said. “We’re just really excited to be working on this together.”

<![CDATA[
#profile-cards overflow:hidden; margin-top: 5px; #profile-cards .ui-corner-all, #profile-cards .ui-corner-top, #profile-cards .ui-corner-left, #profile-cards .ui-corner-right, #profile-cards .ui-corner-tr -moz-border-radius-topright: 0px; -webkit-border-top-right-radius: 0px; -khtml-border-top-right-radius: 0px; border-top-right-radius: 0px; -moz-border-radius-topleft: 0px; -webkit-border-top-left-radius: 0px; -khtml-border-top-left-radius: 0px; border-top-left-radius: 0px; #profile-cards ul a:hover text-decoration:none; #profile-cards ul .ui-state-default a color:#b2b2b2; #profile-cards ul .ui-state-active a color:#fff; outline:none; text-decoration:none; #profile-cards-header font-size: 16px; font-weight: bold; font-style: italic; color: #EF3320; position: relative; margin-bottom: 1px; ]]>

More about the companies and people from this article:

Use a free or cheap marketing automation system?

Tell us what’s great about it (and not so great)

, and we’ll share survey data from everyone else with you.

Excerpt from:

Databricks ropes in Alteryx to push Spark adoption for big data …

Author

Eric Blattberg