Who's hitching their wagons to which Big Data frameworks …

Who's hitching their wagons to which Big Data frameworks …

Who’s who in  Big Data? More specifically, who’s hitching their wagons to which frameworks in their efforts to establish Big Data solutions in the enterprise? That’s one question we set out to answer during last week’s BigDataSV event, asking industry leaders and analysts what they’re seeing in the market. Three that were mentioned most include HBase, MapR, and YARN. All three attack Big Data applications differently to support mission-critical applications for today’s enterprise. Talking to our #techathletes on theCUBE, we looked at some of the top solutions for deploying Big Data in the enterprise, learning directly from those on the front lines.

Camp 1: HBase

 .

WANdisco all-in on HBase

WANdisco CEO David Richards and Jagane Sundar, CTO and VP of Engineering of Big Data joined theCUBE to discuss the company’s latest developments and the current Big Data trends at #BigDataSV earlier this month. WANdisco released this month a new Non-Stop Hadoop product, a single cluster running HBase that can be deployed across multiple data centers spread over different regions. Commenting on HBase adoption, Richards mentioned a 30 percent deployment among customers. It is mostly used for stock feeds, twitter streams, streaming real time apps.

“We’re seeing great desire for CIOs to do whole sale replacement their technology. Analyzing the market is difficult, it’s really tough. One of the proxies for Big Data is hard drive manufacturers, who are up 15 percent. Where Big Data adoption in real production environments is concerned, much like Splunk’s success, players such as Hortonworks and Cloudera will take over the market, even if it’s still dominated by the big whales. What we’re seeing, what I expect to see, companies that traditionally invest in public companies have to move down the stack and invest in private companies.”

Brett Rudenstein, Senior Product Manager of Big Data for WANdisco added to to Richards comments across the street at day 2 of the Strata Conf. 2014 in Santa Clara, California. In WANdisco’s opinion, HBase is a slam dunk.

“HBase is effectively a storage for big beta applications; some people call it a key value store, but the fundamental principle behind it is being able to store billions and billions of rows of data and, in the same time, have (near) real-time access to that data. From a database perspective, the reason that it’s often picked is because of the level of scale that it’s able to achieve and also because it is fundamentally a Hadoop database. Because HBase stores its log files into HDFS, the first thing that you need is a hardened HDFS whereby you can withstand failure,” answered Rudenstein.

Camp 2: MapR

 .

MapR can do speed and efficiency, but more focused on problem solving

MapReduce allows you to process Big Data in a distributed and parallel manner. Jack Norris, MapR CMO, joined John Furrier and Dave Vellante on theCUBE during our same coverage of the 2014 Strata Conference. Norris says MapR maintains “a truly focused business model,” providing innovations and advantages that benefit customers’ bottom line. He notes that Cisco has focused on how they can best leverage the data and are now dramatically expanding their use cases and how to derive value from data. Norris suggests, “sometimes its leveraging new data sources, sometimes its leveraging the data sources that they have available.”

“Open source may imply a singular business model, which caused some initial confusion. Still, I believe MapR’s hybrid model has proven it’s uniqueness and efficiency. It’s easier for folks to get to much faster — there’s been a pretty fast and broad acceptance of that with enterprise customers.”

HP Vertica is not only on board, but it believes that analytics should be embedded in everything. Colin Mahony (VP & GM of HP Vertica) and John Schroeder (CEO of MapR) joined us on theCUBE at #BigDataSV and the coversation broke almost immediately. A CUBE alumn, Mahony spoke as to why HP Vertica chose MapR:

“We are really excited about our relationship with MapR; we’re combining two great solutions so that customers who want to take advantage of big data (or any data), can do it seamlessly. What Vertica brings to the table, is an incredible MPP SQL analytics platform, but when you think about the big data lake, it just makes sense that you can have a single environment where you can do anything you want against the data. Like with most great partnerships, it’s really customer-driven.”

So what is the state of Big Data as an industry? Schroeder commented:

“If you look specifically at Hadoop, it’s settling down to a couple of platform providers, and we’re the leader there, but I don’t think it’s ready to vertically integrate the stack.”

Camp 3: YARN

 .

YARN matures, set to drive next generation

The best example for a YARN deployment might be the partnership between Hortonworks that has Microsoft that goes back 18 months. Hortonworks focuses on making Hadoop great, and Microsoft focuses on helping its customers get data out of Hadoop and deliver it to their end users. There has been a lot of conversation last week around YARN, so theCUBE host John Furrier asked Eron Kelly, GM Product Marketing – Data Platform at Microsoft and John Kreisa, VP of Strategic Marketing at Hortonworks directly about YARN. We wanted to get both Kelly and Kreisa’s temperature on where YARN stands right now.

John Kreisa said, “YARN is a maturing technology, its out in Hadoop 2.0 and now in Hadoop 2.2 that Microsoft is bringing in and of course Hortonworks data platform really driving the next generation. It allows different technologies to integrate natively and use the resources within the cluster more effectively. Eron talked about the fact we’re seeing 40-50 percent higher performance on things like queries, which is related to the Stinger project, but also overall platform and cluster utilization. We’re seeing big enterprises be able to reduce in some case the number of nodes they have to use to run the same workload. It’s a very efficient framework within Hadoop.”

Microsoft has been adamant that its going to bring Big Data to 1 billion users, and in order to do so YARN is going to be a big part of that. When asked if he wanted to back off that statement he made 105 days ago, Kelly said:

“The strategy and vision statement still holds and in fact we’re just really building momentum towards that. With the release of Power BI on Monday it does make it really really easy for any user to get access to data on Hadoop and start to do analysis.”

He went on to provide a use case: The City of Barcelona is using Power BI to collect Twitter sentiment to measure, connect, and correlate its Twitter sentiment for citizens based on festivals with the availability of different resources like buses being on time. It’s already working too. Recently, there was a concert in Barcelona that ended at 2:00am. People went to the bus stop to catch a bus home and the buses weren’t there. Those people started tweeting how they were angry because the buses weren’t there and the city of Barcelona was able to catch that sentiment and make a decision based on it to reroute buses back to them.

Where’s the dust settle?

 .

So when the dust settles, here’s what we know: WANdisco is hitched to HBase, HP Vertica is hitched to MapR and Hortonworks is hitched to YARN. It is a great time in the industry if you’re a buyer, because there is so much innovation coming from so many different players. But with that breadth of players comes a lot of fragmentation, making the formation of the ecosystem more challenging for the buyer to figure out what works. It is a buyers market but with one caveat: do your homework. Solutions right now are best served to specific problems that desire specific outcomes.

photo credit: milos milosevic via photopin cc

More: 

Who's hitching their wagons to which Big Data frameworks …

Share this post