This the fifth article in a series from the editors of insideHPC on HPC storage. This week we different approaches to data storage.
A different approach to data protection is clearly required if the limitations of hardware-based RAID are to be overcome. Erasure codes are a software-based method of error correction used for data protection. With erasure codes, data is broken into fragments which are stored across different disks and nodes in the storage system. Data that becomes corrupted is rebuilt using fragments stored elsewhere across the array. With erasure codes, extra bits are added to the data fragment such that a rebuild can be accomplished from a certain amount of verified fragments. For example, in a 10/16 error correction scheme six extra symbols would be added to the ten base symbols, and striped across 16 drives or nodes. The original file can be reconstructed from ten of those drives.
Ease of Management
An ideal scale-out NAS system should be easy to deploy, manage, and use. Aside from a single global namespace, it should be easy to set up and manage, and able to scale non-disruptively by adding nodes without downtime using automatic discovery and automatic load balancing. It goes without saying that a single point of management is necessary if management it to be easy and uncomplicated.
The management interface should display all critical system information, and the administrator should be able to view status details. The management interface should report on overall system state, error messages, capacity and disk utilization, throughput, and response times. Storage should be able to be configured to virtual volumes within the file system, and quotas assigned to groups, users, or applications. The interface should be able to administer a hundred nodes as easily as a single node.
Next week we review the Panasas ActiveStor 16 solution for HPC storage.If you prefer you can download the complete insideHPC Guide to HPC Storage in PDF format.