Storage system availability has become an increasing concern as very high capacity drives have come to market in recent years. While availability has not traditionally been a chief concern in HPC, it has now become of paramount importance. Large capacity drives expose systems to multiple-day rebuild times and extended periods of vulnerability to data loss using traditional hardware based RAID.
This is the fourth article in an editorial series on HPC Storage.
The Increasing Trouble with RAID and Disk Rebuild Times
Much has been discussed in the industry in the last few years about the limits of RAID, given the massive increases in the size of data, as well as the enormous increases in the capacity of disk drives. Drive bit error rates have remained constant, but drive capacities have grown rapidly each year. In the last ten years SATA drives have gone from 1 TB to 6 TB, and further increases are inevitable as the hard drive industry continues to drive down the price per GB. If we multiply these two factors, error rates per drive and in turn the probabilities of bad sectors per drive which are quite high even with RAID 6, are going to continue to grow. This is compounded by the fact that larger drives take longer to rebuild, leaving the remaining drives unprotected for a longer time frame. A 7200 RPM drive writes on average at 120 MB/sec but slows down as it reaches capacity. A 6 TB drive would take about 15 hours to rebuild itself at top speed.
The reality is that most arrays cannot afford the risks of long rebuild (which are still painfully long even at top speed). Realistically, rebuilds are never at top speed. Assume between two and five times longer to rebuild, which can mean up to 75 hours for 6 TB drive rebuild. Today, 48 hours for 6 TB drive rebuild seems to be a commonly accepted figure. In addition, while RAID proponents assume that disk failures are independent events, experience has shown that this not necessarily the case. Disks in an array are often of the same age, and from the same manufacturer. One drive failure can mean another is likely.
Below are some sample calculations for single drive average rebuild times. For reasons previously mentioned, it would be prudent to conclude that disk failures are not evenly distributed over a disks lifetime, nor are they necessarily independent events. Empirical evidence bears this out. In fact, the additional stress on drives from RAID rebuilds often causes additional failures. In a system with 100 disks, or perhaps 1,000 disks, the risks become unacceptably high.
Traditional hardware based RAID cannot effectively deal with these developments that have become the norm in the last decade, and which will only continue to increase as capacities climb. Storage array vendors have responded by introducing double parity schema such as RAID 6. There is however one big challenge with traditional hardware RAID 6. It carries a much bigger write performance penalty than RAID 5. In fact, RAID schemes for traditional architectures come with painful tradeoffs one can optimize for performance, capacity, or reliability, but not all three simultaneously. With rebuild times of 48 hours common for a 4TB drive, that leaves 48 hours of vulnerability to a second, third, and fatal failure in the array. With 6 GB drives, the probability of a failure and data loss on a full shelf approaches 0.1 % within five years. That risk is simply unacceptable to most organizations.
Next week well look at different approaches to data protection.If you prefer you can downloadthe complete insideHPC Guide to HPC Storage in PDF format courtesy of Panasas.