How WANdisco enables Big Data in geographically distributed …

How WANdisco enables Big Data in geographically distributed …

There’s a lot of excitement around Hadoop, but at present only a portion of operational clusters are in production, due primarily to a lack of enterprise features such as disaster recovery and continuous availability. WANdisco chief marketing officer Jim Campigli and CTO Jagane Sundar stopped by theCUBE during SiliconANGLE’s BigDataSV meetup to discuss how their firm is helping to change that with host John Furrier.

WANdisco offers a network solution called Non-Stop Hadoop that allows customers to stretch their deployments across multiple data centers in order to maximize data availability and provide universal access to information, Campigli says.

“What you have with our Non-Stop Hadoop solution is a package that enables you to have active data centers in multiple locations,” he elaborates. “So 100 percent uptime is critical, it’s key, and it’s the core of what we provide – that ability to not have to worry about disaster recovery, to not have to worry about data availability, failover, those kinds of things.” The solution also enables data consistency, he adds, meaning that all end-users share a single version of the truth that is automatically updated as soon as a change is made.

The software is based on WANdisco’s patented active-active replication technology, which is in turn an implementation of the notoriously complex Paxos fault-tolerance algorithm, and recently received an update adding support for the Apache HBase non-relational Hadoop database. The open source platform has come a long way in the last few years, Sundar notes, to the point that it is now reliable enough to support mission-critical transactional applications that have historically run against Oracle or DB2. And that’s saying a lot.

“If it’s critical data, good enough is not good enough. You need guarantees, you need mathematical certainly that your data is replicated and available – that’s what Paxos brings to the table. Our own internal implementation of Paxos has added enhancements [that] add additional value to this; we’re able to offer guarantees that most other vendors cannot,” he tells Furrier.

Non-Stop Hadoop removes many disaster recovery bottlenecks, Sundar continues, effectively eliminating distance limitations and thus ensuring that data remains accessible even in the event of widespread disruption affecting an entire region. It’s also simpler than alternative solutions – he mentions EMC’s SRDF (Symmetrix Remote Data Facility) in particular – and lets developers build applications without having to worry about traditional operational risks. According to Campigli, that makes it possible to run real-time apps on Hadoop that would not be otherwise viable from a technological standpoint.

Another major selling point of the product is that it’s compatible with all of the components in the Hadoop ecosystem, including SQL solutions, which Sundar believes will hit mainstream adoption in the foreseeable future. He also forecasts consolidation in the distribution space and to a lesser degree in the emerging analytical applications market, which Tresata CEO Abhi Mehta views as the next frontier for Big Data innovation.

See Campigli’s entire segment below.

photo credit: marsmet547 via photopin cc

About Maria Deutscher

Maria Deutscher is a staff writer for SiliconAngle covering the enterprise cloud space. If you have a story idea or news tip, please send it to @SiliconAngle on Twitter.


View all posts by Maria Deutscher

Read this article: 

How WANdisco enables Big Data in geographically distributed …

Share this post