Whitepaper: Experiences with the Hadoop File System

February 21, 2011 by Doug Black

[HTML1]

In this whitepaper, Yahoo engineers Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansle look at HDFS, the file system component of Hadoop. While the interface to HDFS is patterned after the UNIX file system, faithfulness to standards was sacrificed in favor of improved application performance.

Abstract—The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size. We describe the architecture of HDFS and report on experience using HDFS to manage 25 petabytes of enterprise data at Yahoo!

A tip of the hat goes to the new Systems We Make blog, who seek to chronicle the growing boom of distributed systems being built in both academia and industry.

Whitepaper: Experiences with the Hadoop File System

Trackbacks

Sponsored Guest Articles

Hammerspace Unveils the Fastest File System in the World for Training Enterprise AI Models at Scale

White Papers

Energy efficiency drives HPC to the cloud

Featured RSS Feed

More News from insideBIGDATA

Whitepaper: Experiences with the Hadoop File System

Trackbacks

Sponsored Guest Articles

Hammerspace Unveils the Fastest File System in the World for Training Enterprise AI Models at Scale

White Papers

Energy efficiency drives HPC to the cloud

Join Us On Social Media

Related Posts

Featured RSS Feed

More News from insideBIGDATA