Cloudera plans to launch data science software, cloud services

Big data software company Cloudera will announce new data science software later this week at the Strata + Hadoop World conference in San Jose, VentureBeat has learned.

The company will launch Data Science Workbench software that helps data scientists and data engineers work together and integrates with Python, R, H2O, and other tools, two sources familiar with the matter told VentureBeat. This follows Cloudera’s 2016 acquisition of startup Sense, which built a data science collaboration tool.

Cloudera was keen to build data science software before the Sense acquisition but ultimately ended up acquiring rather than releasing its own tool, Charles Zedlewski, Cloudera’s senior vice president of products, told VentureBeat in an interview. Since the acquisition Cloudera has developed integrations with Apache Spark, Kerberos, and the Hadoop Distributed File System (HDFS), Zedlewski said.

Cloud-based and on-premises versions of the software will be available, Zedlewski said. A private beta of the software became available three months ago, and 30 customers are on the waiting list for it, Zedlewski said. Competitors include Domino Data Lab.

June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.

Additionally, Cloudera is preparing to unveil new cloud-based services, which VentureBeat reported on in August.

Among the new products is a managed version of its Enterprise Data Hub (EDH) that will run on the Amazon Web Services (AWS) public cloud, the first source said. Cloudera will handle the management of nodes so customers don’t have to. This will distinguish it from AWS’ own Elastic Map Reduce (EMR) and other cloud services that operate versions of certain components of the Hadoop open source software for storing and processing lots of different kinds of data.

There will also be a cloud-based version of Cloudera’s Impala massively parallel processing (MPP) engine, with which people will be able to run queries on data stored in Amazon’s widely used S3 storage service, the first source said. The launch will follow AWS’ introduction of the Athena querying service. Impala got support for Amazon S3 last year, Zedlewski said.

And there will be an Altus metadata service running on AWS infrastructure, sources said. The name is a reference to the Apache Atlas open source metadata and data governance software. The sources were not sure when the three cloud services would launch.

Cloudera brought in $330 million in revenue in its 2016 fiscal year, which ended in January, the first source said. Zedlewski wouldn’t comment on that, nor would he talk about the upcoming services other than the Data Science Workbench. He did say the company is always looking at ways to make Cloudera software easier to run on public clouds. In the past three years it has become more popular to run the company’s software on public clouds as opposed to in on-premises data centers, Zedlewski said.

Last week Bloomberg reported that Cloudera had filed confidential paperwork for an initial public offering (IPO). One of Cloudera’s competitors in the Hadoop world, Hortonworks, went public in 2014.

Update on March 14: Cloudera issued a statement on the new Data Science Workbench.

The insights you need without the noise