Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


The Race for a Unified Analytics Warehouse

This white paper, “The Race for a Unified Analytics Warehouse,” from our friends over at Vertica discusses how the race for a unified analytics warehouse is on. The data warehouse has been around for almost three  decades. Shortly after big data platforms were introduced in the late 2000s, there was talk that the data  warehouse was dead—but it never went away. When big data platform vendors realized that the data warehouse was here to stay, they started building databases on top of their file system and conceptualizing a  data lake that would replace the data warehouse. It never did.

The Race for a Unified Analytics Warehouse

This white paper from our friends over at Vertica discusses how the race for a unified analytics warehouse is on. The data warehouse has been around for almost three  decades. Shortly after big data platforms were introduced in the late 2000s, there was talk that the data  warehouse was dead—but it never went away. When big data platform vendors realized that the data warehouse was here to stay, they started building databases on top of their file system and conceptualizing a  data lake that would replace the data warehouse. It never did.

GigaOm Radar for Evaluating Data Warehouse Platforms

This new GigaOm Radar Report provided by our friends over at Vertica, examines the leading platforms in the data warehouse marketplace, describes the fundamentals of the technology, identifies key criteria and  evaluation metrics by which organizations can evaluate competing platforms, describes some potential  technology developments to look out for in the future, and classifies platforms across those criteria and  metrics.

Chalk Talk: What is a Data Lake?

“If you think of a data mart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.” These “data lake” systems will hold massive amounts of data and be accessible through file and web interfaces. Data protection for data lakes will consist of replicas and will not require backup since the data is not updated. Erasure coding will be used to protect large data sets and enable fast recovery. Open source will be used to reduce licensing costs and compute systems will be optimized for map reduce analytics. Automated tiering will be employed for performance and long-term retention requirements. Cold storage, storage that will not require power for long-term retention, will be introduced in the form of tape or optical media.”