Pivotal continues to churn out new, important solutions that are helping companies make better use of big data. This week, we have 2 major announcements that will help Hadoop distributions deal with real-time transactional data more effectively and to provision any kind of big data sets, including data from Hadoop, MPP, flat files or a legacy database. The new solutions are known as Pivotal GemFire XD and Pivotal Data Dispatch (Pivotal DD) respectively.
Memory is 100,000 times faster than disk, so it makes sense that any companies are looking to use in-memory data stores like GemFire as part of their big data infrastructure. Last week we shared how GemFire could be used to improve the performance and life of legacy data stores, and also how one bank used GemFire to keep trading data real-time and overcoming latency issues to give customers reliable financial transactions.
Apache Hadoop, with its reputation to munch and crunch big data at unprecedented speeds, is actually traditionally designed to do offline analysis. However, as evidenced with new projects such as Storm and Spark gaining attention in the market, there is a movement underway to employ Hadoop technologies in more real-time solutions. GemFire XD takes Pivotal’s proven in-memory data store and applies it to Hadoop for java workloads. The result is that companies are now able to combine the power of and storing and processing data in memory with the scale out persistence provided by Pivotal Hadoop.
Essentially, GemFire XD will allow applications using Hadoop data to access hundreds of gigs of data in-memory in a single process without incurring the penalties usually associated with garbage collection issues in a JVM. With access to large volumes of data in memory, GemFire XD allows the execution of parallel stored procedures on large volumes of data, reducing network I/O and allowing everything from map-reduce like functions to arbitrary behavior execution on the data to run more efficiently.
The new product is packaged as beta and is expected to be available as an add-on for the new release of Pivotal HD 1.1 due out around November 1, 2013.
Pivotal Data Dispatch (Pivotal DD)
Developed in partnership with NYSE Technologies, Pivotal Data Dispatch is aimed squarely at the big data information worker. The idea of this product is to provide data analysts with an easy way to provision various big data sets from any source, including Hadoop, MPP, flat files or legacy databases. Allowing for both internal and external data sets, this solution creates a self-service sandbox for data workers.
With security and access control for sensitive data, analysts can browse a catalog of disparate data platforms registered by IT for fast access to an electronic library of files and database systems. Systems may also include Hadoop-based platforms, data grids and cache services, MPP appliances, message queues, and internal or external clouds.
Pivotal DD is a full-fledged analytics app running on Pivotal GPDB, Pivotal HD and HAWQ. Most easily thought of as a logical data warehouse, it is also set up to allow IT to build in a governance system, with a single metadata repository for unified access, security, lineage, and auditability.
Since 2007, the NYSE has been using Pivotal Data Dispatch to provision millions of files and terabytes of data per day, in near real-time.
Pivotal Data Dispatch will be generally available in Q4, 2013.
Posted By Stacey Schneider
Stacey Schneider is the managing editor for the GoPivotal blog. She has over 18 years of working with technology, with a focus on working with large scale customer facing systems as well as internationalization. Schneider has held roles in services, engineering, products and marketing across both enterprise software and open source companies. For the past several years, she has been working for VMware and now Pivotal to help evangelize how companies build applications in the cloud faster, with bigger data and nearly unlimited scale. Prior to VMware, Schneider ran marketing for open source Hyperic (now owned by VMware). She also held various positions at CRM software pioneer Siebel Systems, including Group Director of Technology Product Marketing, a role for which her contributions awarded her a patent. Follow her on Twitter @sparkystacey.