Cloudera’s “magnificent obsession,” that of building a single, universal data platform with a single, unified control plane, is something the company has thought about and worked toward for several years. But it wasn’t until recently that the company made moves that may make their vision a reality.
A fully realized Cloudera platform that delivers the cloud experience to data anywhere for AI everywhere was the focus of the company’s recent EVOLVE25 conference in New York. The platform’s objective is to enable companies with sprawling data environments to rationalize and utilize their data regardless of whether it’s on- or off-prem, regardless of cloud platform or geographical location.
Corralling sprawling data sets means, among other things, that organizations can access, organize and leverage their data for training AI models, leveraging their data for maximum value.
Founded in 2008, Cloudera is used for hybrid data management, security, and analytics and AI, providing a platform to manage and analyze large data sets across public, private, and edge environments. Its platform addresses a range of use cases, including predictive analytics, fraud detection, risk management, supply chain optimization, and real-time AI workloads.
Cloudera is built for larger corporations facing some of the most complex data management challenges. As Cloudera CEO Charles Sansbury explained it, larger companies live in a hybrid data world. Some of the company’s customers have data lakes estimated to have 3 exabytes of data, and one customer, one of the major petroleum companies, has 120,000 global end users.
And what they want is a single view and a single avenue to using all that data wherever it may be.

“If you look at our customer base, using t-shirt sizes, we do XL and double XL,” Sansbury told us. “We don’t really do L that much, and small and medium are not in the discussion. And so our customers have data all over the world. They want to be able to bring a management layer of data mesh to all those data sources, because they didn’t want to take all their data, move it to the cloud, lose control of it, and have to do all that to achieve their AI-based initiatives.”
The Cloudera platform has a pedigree of utilizing open-source projects, including Hadoop Distributed File System (HDFS), Spark, and Iceberg–all integrated but open for expansion with custom APIs and SDKs. Its adoption of the Model Context Protocol (MCP) open standard enables AI model training on hybrid data — without siloing. Developers, analysts, and data scientists leverage these extensible tools to build analytical solutions, machine learning models, and data-powered applications quickly.
And now the Cloudera platform could be poised to take the final step toward universality, expand its reach to deliver a cloud experience to data anywhere – something that had been regarded at Cloudera as possibly more aspirational than achievable. It could become available within the next six to nine months.
Key to Cloudera’s hybrid incarnation are two acquisitions announced in the last 12 months.
This includes the August acquisition of Taikun, a Prague, Czech Republic-based platform provider for managing Kubernetes and cloud infrastructure across hybrid and multi-cloud environments.
Cloudera said Taikun’s technology provides an integrated compute layer that unifies deployment and operations across the IT stack, delivering a cloud-like experience anywhere. The result:
- Customers can deploy data and AI workloads in the data center, in the cloud, or in hybrid environments without sacrificing performance or freedom of choice, according to the company. Taikun supports highly regulated environments such as GovCloud, Sovereign Cloud, and air-gapped data centers, providing solutions with cloud, data, and services.
- The integrated compute layer enables upgrades to take place without downtime for efficiency and reduced operational risk.
- Faster adoption of Cloudera and partner technologies: Customers can take a “bring your own engine” approach by integrating tools and databases from Cloudera and its partner ecosystem – from Cloudera Data Services and such cloud technologies as Spark, HBase, Ozone, Kafka and Trino to third-party graph databases.
- Cloudera said its cloud-anywhere architecture expands deployment options and support ensures long-term flexibility as business mandates evolve.

Charles Sansbury, Cloudera CEO
“This acquisition marks a pivotal step in our mission to bring the cloud experience wherever enterprise data resides,” said Sansbury. “By integrating Taikun’s container-native platform in our stack, we are removing operational barriers and enabling our customers to unlock faster insights, make smarter decisions, and drive real-time action in every corner of their business.”
Another piece of the Cloudera puzzle is the acquisition last November of the Israeli company Octopai, a data lineage and catalog platform that provides data discovery and governance for data-driven decision making. This means, as the company says, “transforming data to deliver trusted and predictive insights.”
Effective AI and generative AI lies in collecting vast amounts of data and understanding and leveraging it. The goal of the Octopai acquisition is to make it easier to better understand, access, and leverage data in and across entire data estate – including data outside of Cloudera – to drive robust data, analytics and AI applications. Octopai delivers access to trusted data so organizations can build AI models and applications by combining data from anywhere in their environments.
Founded in 2016, Octopai entered the metadata management arena with automated data mapping and knowledge graphs to activate metadata and help data analysts gain insights into the data landscape. This, coupled with AI copilots, are designed to accelerate use of high-quality data for analytic and AI outcomes. The goal is to help organizations save time on change or impact analysis, reduce errors and costs in their data operations, and comply with regulations.
Integrating Taikun and Octopai capabilities into Cloudera could mark a major evolutionary step from earliest days of the platform, initially conceived as a unified platform merging Hadoop, Spark, and other open-source tools, into a hybrid, AI-centric ecosystem that leverages containerization, Kubernetes-native operators, and multi-cloud networking capabilities, including Private Link Networks for secure connectivity and workload isolation.
At EVOLVE25, we discussed Cloudera’s strategy with industry analyst Patrick Moorhead, CEO of Moor Insights & Strategy, who said the company could be on the verge of something significant.
“The future is hybrid and multi-cloud,” he said. “If you subscribe to that future, then it follows that data management must be hybrid. Effective data management is vital for successful AI projects and digital transformation. The myriad of point solutions we have collected over the years must go as integration costs, and the lack of hybrid support will make them cost prohibitive.”
Cloudera, he said, is working to address these challenges.
“Data is dirty and data is ugly,” Moorhead said. “You’ve got to bring it in, you need to clean it up, and then you tag it, and maybe you’re going to tag it using regular tags where you’re going to do meta tags. And then you need to bring it in to get trained, and then you spit it out and do inference on it. Then you need to deploy it, deploy it somewhere else. Cloudera has a full end-to-end pipeline, and with these new services (Takun and Octopai) adds to that. They’re checking all the boxes for an end-to-end AI pipeline.”
Now Cloudera just needs to get its platform across the hybrid finish line.



