Lustre and ZFS to Power New Parallel File System at LLNL

Print Friendly, PDF & Email

raidToday RAID Inc. announced a contract to provide Lawrence Livermore National Laboratory (LLNL) a custom parallel file system solution for its unclassified computing environment. RAID will deliver a 17PB file system able to sustain up to 180 GB/s. These high performance, cost-effective solutions are designed to meet LLNL’s current and future demands for parallel access data storage.

According to Mark Gary, Deputy Division Leader in Livermore Computing, this new file system infrastructure “will be deployed in support of cutting edge application development and large-scale scientific simulation in LLNL’s unclassified environment.”

LLNL has built a world-class high performance computing ecosystem designed to address a range of complex computational challenges which can also be used to solve high-impact problems critical to national concerns.

“RAID’s tested parallel file system solutions are designed to help accelerate LLNL’s HPC leadership in building the next generation of open source production environments,” said Robert Picardi, CEO of RAID Inc. “And there are many commercial applications which will benefit from highly scalable, cost efficient and high performance storage solutions.”

Additional details:

  • The parallel file system will run Lustre 2.8 with ZFS OSDs and multiple metadata servers.
  • The Lustre file system contains 36 OSS nodes, with each node capable of 5 gigabytes per second of sustained data performance, and 16 metadata servers with 25TB of SSD storage capacity.
  • The solution is anchored by enterprise class 4U 84 bay 12G SAS JBODs, LSI/Avago 12G SAS adapters, Mellanox EDR InfiniBand, HGST 12G Enterprise SAS disk drives, and Intel server technologies.
  • The file system incorporates 6 scalable storage units each containing six Lustre OSS and six 4U-84Bay JBODs with 480-8TB SAS drives. The solution will be employing ZFS on Linux with raidz2 data parity protection. Resiliency is provided by multipath and high-availability failover connectivity, intended to eliminate single points of failure.
  • An additional software feature was added to manipulate tunable features and settings on disks in the same way RAID controller manufacturers fine-tune disk firmware for their enclosures. It not only squeezes every bit of performance out of the drives but also provides extensive diagnostic reporting in order to catch and potentially fix problems long before they affect data flow and integrity.
  • LLNL’s HPC facility consists of numerous computer platforms and file systems spanning multiple buildings and operating at multiple security levels.

Sign up for our insideHPC Newsletter

Comments

  1. Interesting that ZFS is being used for enterprise grade solutions. It’s been notoriously hard to estimate its reliability for desktop/consumer/non-large-scale usage.