This the second article in a series from the editors of insideHPC on HPC storage. This week we explore the increasing popularity of Scale-out NAS in HPC environments.
Scale-out NAS (Network Attached Storage) is very well suited for applications that require high throughput and I/O – such as those in energy, finance, government, life sciences, manufacturing, media, and university research.
Deploying large-scale NAS systems has traditionally presented challenges for technical workloads. As previously noted, once the system capacity is reached with traditional scale-up deployments, an upgrade or additional storage island is required.
Scale-out storage architectures have become more popular in recent years as a solution to scale-up challenges. The scale out approach eliminates the limitations of traditional scale up architectures and certain inefficiencies such as “forklift upgrades” when a larger system is required. With a scale out approach, it’s possible to start with a small node and independently scale both bandwidth and capacity as required.
Scale-out systems address performance and scalability needs by adding nodes; however, managing these systems can be difficult without management tools and features related to scale. Without a global namespace, adaptive automation, and centralized management capabilities…to name a few… scale-out systems can be just as difficult to manage as scale-up systems. So, the real benefits of scale-out NAS are only experienced when management at scale is built into the scale-out architecture.
Linear Scalability
A good scale-out NAS system requires distributed intelligence such that each controller node sees other controller node, and controller node can see all storage nodes. This allows controller nodes and storage nodes to work in concert for access to any device in the global namespace. A fast interconnect is required between nodes and the storage devices.
In addition to the global namespace, scale-out storage requires good load balancing. As nodes are added, or as workloads change, the workloads must be redistributed so that the load per node is roughly balanced. Linear scalability can’t be achieved with performance hotspots and bottlenecks.
Architecture for Performance and Scalability
Today’s storage systems make use of solid state drives, which perform very well for small random I/O operations and have a cost per I/O that is lower than traditional hard disk drives. However, the cost per GB for solid state drives is still much higher than for traditional hard disk drives by a factor of approximately 10X. Hybrid storage on the other hand can actually bring about massive increases in performance at a relatively low cost, and with a relatively small percentage of flash storage in relation to spinning disk storage.
Hybrid storage represents the best balance of performance, capacity, and cost. This is an ideal combination for HPC and Big Data applications, as it combines the performance of flash with the cost effectiveness of hard disk drives, while also reducing overall power consumption. In order for hybrid storage to be effective, an efficient tiering mechanism is required. Many storage systems deploy a mix of solid state and hard drives, often with the intent of using the SSDs as a cache. However, caching software is difficult to implement efficiently. In a hybrid storage system if the cache miss rate is five percent the system performance declines by 250%, assuming a 50 to 1 difference in performance between SSD and HDD devices. This is, of course, underscoring the fact that caching algorithms need to be exceptionally accurate when SSD is used as cache.
Next week’s article will explore strategies to maintain HPC storage performance at scale. If you prefer you can download the entire insideHPC Guide to HPC Storage for free courtesy of Panasas. You can download this PDF now from the insideHPC White paper Library.