In this video from LUG 2015 in Denver, Stephen Skory from Seagate Technology presents: Understanding Hadoop Performance on Lustre.
For countless reasons, running Hadoop within a Lustre environment is a technically attractive proposition. Workflows usually involve multiple steps combining traditional High Performance Computing (HPC) applications and output alongside Hadoop ecosystem analytics. The POSIX compliance of Lustre eliminates the need to move data in and out of the Hadoop Distributed File System (HDFS) and simplifies all aspects of data management. The design of HDFS keeps computation and storage together, while the decoupling of storage and computation via Lustre allows for more flexibility in system design. However, due to fundamental design differences between Lustre and HDFS, enabling Hadoop to utilize Lustre presents challenges – and new opportunities. In this talk, Seagate presents details on its efforts and achievements around improving Hadoop performance on Lustre including a summary on why and how HDFS and Lustre are different and how those differences affect Hadoop performance on Lustre compared to HDFS, Hadoop ecosystem benchmarks and best practices on HDFS and Lustre, Seagate’s open-source efforts to enhance performance of Lustre within “diskless” compute nodes involving core Hadoop source code modification (and the unexpected results), and general takeaways ways on running Hadoop on Lustre more rapidly.”
To better understand how Lustre works and how your organization can benefit from Lustre, download the inside Lustre white paper now.