Yahoo to Push Supercomputing

Print Friendly, PDF & Email

It appears Yahoo is opening up some of it’s computer infrastructure for supercomputing reasearch.

Sunnyvale-based Yahoo (Nasdaq:YHOO) said the program is intended to leverage its leadership in Hadoop, an open source distributed computing sub-project of the Apache Software Foundation, to enable researchers to modify and evaluate the systems software running on a 4,000 processor supercomputer provided by Yahoo.

“Unlike other companies and traditional supercomputing centers, which focus on providing users with computers for running applications and for coursework, Yahoo’s program focuses on pushing the boundaries of large-scale systems software research,” the company said.

Hadoop appears to be some sort of software based RAID, there isn’t a lot of information on the website and it appears to be a small project.

Hadoop is a software platform that lets one easily write and run applications that process vast amounts of data.

Here’s what makes Hadoop especially useful:

  • Scalable: Hadoop can reliably store and process petabytes.
  • Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes.
  • Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid.
  • Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.

Comments

  1. Hadoop is a lot more than a RAID. It is also being used by Google and IBM to teach courses on distributed computing. Hadoop is the open source version of Google’s MapReduce method for large scale distributed computing, which amongst other things powers Google Maps. It’s a beautiful concept, ultra scalable and does not require you to be a distributed programming guru, although there is a third component (the framework) which tends to be proprietary to the companies deploying MapReduce/Hadoop etc.

    Don’t have access to lots of processors. Just get an EC2 account 🙂