Frequently asked questions about large data clouds

July 21, 2009 by Doug Black

Bob Grossman, whose work I wrote about last week for HPCwire, has a post at his blog that addresses some of the FAQs regarding the processing of large datasets in a cloud infrastructure. The post is interesting, and if you are at all interested in clouds for large data, or scientific computing in general, I recommend a read.

In the post he sets a framework for discussion (what is large data?), and identifies several of the cloud solutions out there that are suitable for dealing with large data such as Aster, Sector, Hadoop, and Greenplum.

How do I get started? The easiest way to get started is to download one of the applications and to work through some basic examples. The example that most people work through is word count. Another common example is the terasort example (soring 10 billion 100 byte records where the first 10 bytes is the key that is sorted and the remaining 90 bytes is the payload). A simple analytic to try is MalStone, which I have described in another post.

I also commend Bob’s blog, From Data to Decisions, to your RSS reader.

Frequently asked questions about large data clouds

Sponsored Guest Articles

Life Is Fleeting, But Data Is Forever – Meet Your Digital Twin

White Papers

Energy efficiency drives HPC to the cloud

Comments

Featured RSS Feed

More News from insideBIGDATA

Frequently asked questions about large data clouds

Sponsored Guest Articles

Life Is Fleeting, But Data Is Forever – Meet Your Digital Twin

White Papers

Energy efficiency drives HPC to the cloud

Join Us On Social Media

Comments

Related Posts

Featured RSS Feed

More News from insideBIGDATA