There is always different levels of importance assigned to various data files in a computer system, specifically a very large system that is storing petabytes of data. In order to maximize the use of the highest speed storage, Hierarchical Storage Management (HSM) was developed to move and store data within easy use of users, yet at the appropriate speed and price. This paper explains how the Lustre HSM system is an industry leading system to deal with these challenges.
Data which is used by large scale computer systems must reside as close to the CPUs as possible in order to maximize the CPUs role, do something with the data. However, data which is less important, accessed infrequently or rarely used can be stored on slower storage systems. The migration of data (files) from fast to slower to slowest, and most expensive to least expensive is where an HSM system comes into play. One of the most important features of the Lustre HSM is that the data is always available, and is listed as available with various tools. Although the access time might vary depending on which level the data is stored, users will not lose track of their data files.
HSM functionality is now available in Lustre® 2.5, closing one of the main requirement requests often voiced from the commercial technical computing community, which has traditionally relied on proprietary, full-feature parallel file systems such as IBM’s GPFS. Lustre is now one of the most successful open source projects to date, with more than 70 active developers and representation from close to 20 companies and organizations.
The whitepaper, Inside Lustre HSM from Seagate explains in detail how the different HSM components work together to make sure that the data is migrated in both directions. The diagrams make it easy to understand how the Lustre filesystem can work in tandem with the Lustre HSM to create an easy to use and reliable way to make maximum use of storage capacities and capabilities.