Talend, an open-source data integration company based in Los Altos with offices in Paris, has raised $40 million from Bpifrance and Iris Capital with participation from existing investor Silver Lake Sumeru. Talend will use the investment to pursue an IPO and be more aggressive in its product roadmap, particularly with a focus on Hadoop technologies. In total, Talend has now raised $101.6 million.
With the funding, Talend will also invest in its inside sales and marketing to build incremental growth from its open-source business. “We are an open source company, we have free downloads,” said Talend CEO Mike Tuchen. “We get tens of thousands of downloads every day. Whenever you do a download based approach, you figure it out after they download — it’s more of a bottoms up approach than streaming a top down segmented approach.”
BpiFrance is a French public investment house that specializes in helping companies do initial public offerings. “We’re building a fast growing market leader which we believe has the potential to go public,” Tuchen said in an email. “Of course, the standard caveats apply about not being able to predict the future.”
Economics are the biggest driver for Talend and others in the big data space, Tuchen said. He said the new approaches to data processing and analytics are 100 times cheaper than data warehouse methods, which as a result is creating a space for new innovations. Customers now have data that comes from any number of sources. The big issue is what to do with it. For Talend, its value comes with providing connectors to the data and taking advantage of it without having to run separate deployments, management and monitoring processes. Overall, the company offers data integration, data management and business process management
For example, the Talend technology runs natively across a Hadoop cluster. Hadoop, for the uninitiated, is a distributed file system that uses MapReduce, a system to process data across a cluster. Talend runs with Hadoop, producing MapReduce code that runs natively. Therefore, from a deployment/scale perspective, there is no new code that needs to be produced, which lessens the cost and speeds up the capability to do analytics.
Hadoop has traditionally been used for batch processing of data. Today there are real-time data streaming capabilities that allow for real-time analytical queries through open-source projects like Storm. Over the past year, Storm was integrated with Yarn, another open-source project, which transforms Hadoop from a data storage and processing platform, to an actual computing platform. Yarn is considered the next-generation of MapReduce, making it possible to do the real-time streaming that companies increasingly need to manage the unprecedented amounts of data they now manage.
According to Talend, the most interesting part about YARN is that it enables the Hadoop platform to become a multi-workload environment, on which different types of processes can be run concurrently. YARN comes with resource management, optimization, scheduling, and more efficient use of Hadoop. It also provides the ability to run different types of tasks than batch-oriented MapReduce jobs.
In February, Yahoo! put its support behind the Storm and Yarn integration. A pioneer in the use of Hadoop, Yahoo makes varied use of it, including personalization of profile information for web and mobile apps.
Talend manages the extraction, transfer and loading (ETL) piece as well as the data quality issues for customers that comes with using new technologies like Hadoop with Storm and Yarn. The service offers a process of data transformation that allows companies to then utilize streaming information in any number of ways.
Amazon Web Services, Google and companies such as Informatica are all seeking to provide their own flavor of analytics for data streaming. Startups, such as Trifacta, which I profiled last week, are providing their own takes on data transformation.
(Feature image courtesy of wlashbrook on Flickr via Creative Commons)