How internet-scale businesses think about big data

August 17, 2009 by Doug Black

Gary Orenstein has an interesting post at GigaOm called, How Yahoo, Facebook, Amazon & Google Think About Big Data. These companies all have developed their own approaches to storing petabytes of data that, unlike much of the data in high end computing, actually gets used more than once after it is written.

Yahoo! has MObStor, Facebook has Haystack, Amazon has Dynamo, and then, of course, there is the Google File System.

Since MObStor, based on when information was released, is the new kid on the block, let’s take a look at some of its standout characteristics:

It’s designed for petabyte-scale content that is site-generated, partner-generated, or user-generated

Handles tens of thousands of page views every second

Unstructured storage/objects are mostly images, videos, CSS, and JavaScript libraries

Reads dominate writes (most data is WORM: write-once read-many)

Only a low level of consistency is required

It is designed to scale quickly and efficiently

One thing that all of these approaches have in common is really smart software on top of really cheap hardware. Which is not how most of the storage technology in HPC is built. It will be interesting to see what happens to our storage technologies as more HPC applications come on line to deal specifically with the incredible volumes of unstructured data that businesses and researchers increasingly need to deal with. I wonder if they will push our community into a crisis akin to the one created by the economics of commodity CPU shift?

How internet-scale businesses think about big data

Trackbacks

Sponsored Guest Articles

The Future-Proofed Datacenter: DDC Delivers 85kW Air-Cooled Density for AI and HPC Workloads

White Papers

Energy efficiency drives HPC to the cloud

Featured RSS Feed

More News from insideBIGDATA

How internet-scale businesses think about big data

Trackbacks

Sponsored Guest Articles

The Future-Proofed Datacenter: DDC Delivers 85kW Air-Cooled Density for AI and HPC Workloads

White Papers

Energy efficiency drives HPC to the cloud

Join Us On Social Media

Related Posts

Featured RSS Feed

More News from insideBIGDATA