HadoopDB combines MapReduce and DBMS

Print Friendly, PDF & Email

From Dr. Dobb’s late last week, news of research at Yale University that combines a DBMS with MapReduce to create a hybrid system the researchers feel is ideal for analytical workloads

Traditional approaches to managing data at this scale typically fall into one of two categories. The first includes parallel database management systems (DBMS), which are good at working with structured data that contain, for instance, tables with trillions of rows of data. The second includes the kind of approach taken by MapReduce, the software framework used by Google to search data contained on the Web, which gives the user more control over how the data is retrieved.

HadoopDB reduces the time it takes to perform some typical tasks from days to hours, making more complicated analysis possible — the kind that could be used to find patterns in the stock market, earthquakes, consumer behavior and even outbreaks, Abadi said. “People have all this data, but they’re not using it in the most efficient or useful way.”

The research was presented at the Very Large Databases (VLDB) conference in Lyon, France, on August 27. More info on the approach in the article. We have written about other competitors in the large scale data management space, notably Pervasive and their DataRush product.