Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Henry Newman on Why Backblaze is Still Wrong About Disk Reliability

Over at Enterprise Storage Forum, Henry Newman comes out of the shortest retirement period since Brett Favre in order to take Backblaze to task on their recent disk reliability study.

Brett Favre and Henry Newman Come out of Retirement. Again.

Brett Favre and Henry Newman Come out of Retirement. Again.

Exactly one year ago today I published an article titled Selecting a Disk Drive: How Not to Do Research. This article took Backblaze to task for a lack of intellectual rigor in their disk drive studies. Their latest study raised my ire enough to write today’s blog – even though I said I was retired from writing. Here’s my first point from a year ago: “Let’s talk about the release data first. The oldest drive in the list is the Seagate Barracuda 1.5 TB drive from 2006. A drive that is almost 8 years old! Since it is well known in study after study that disk drives last about 5 years and no other drive is that old, I find it pretty disingenuous to leave out that information. Add to this that the Seagate 1.5 TB has a well-known problem that Seagate publicly admitted to, and it is no surprise that these old drives are failing.”

Henry goes on to say that these drives are still being used and still being reported on by Backblaze. So one has to wonder why, if the drives are so bad, they are still being used and reported on year after year.

With a known lower hard error rate, why would Backblaze use consumer drives for an enterprise application? Maybe they do not think their users’ backup data is that important and believe that all that is needed is consumer drives. Nothing has changed from last year; there is no comparison of consumer and enterprise drives. There seems to be a belief, with no supporting evidence, that enterprise drives are more expensive with no benefit. The Backblaze approach seems to be that they don’t want real research to get in the way of their opinions. Maybe they should join the Flat Earth Society while they’re at it.

As you may recall, Henry went on a Rant on this topic last February over at Radio Free HPC.

0:00

 
Sign up for our insideHPC Newsletter.

Comments

  1. Here’s the response I left there, which seems to have been swallowed by “moderation” and is unlikely to survive the experience.

    === begin quote
    Even if attribution of motive (“they do not think their users’ backup data is that important”) weren’t inherently fallacious, this one’s particularly off base. They are using RAID 6, so even with the worst drives in their system they’re unlikely to lose data. Should they incur a large expense to replace all of those drives immediately instead of letting them age out? Should they use erasure codes instead of RAID 6? Good questions, certainly, but well beyond the threshold implied by your accusation. At least they provided empirical data, no matter how flawed you might consider it. When you don’t even cite (let alone generate) any data yourself, “don’t want real research” and “Flat Earth Society” look more than a bit hypocritical. If we judge people by the enemies they make, your response only makes Backblaze look better.
    === end quote

  2. Backblaze doesn’t need to care about reliability of individual drives, so much as overall integrity of the data. If they have sufficient redundancy through RAID, quick drive replacement, etc, then customer data will remain safe. The wide variety of drive makes and ages helps diversify expected failure dates, compared with buying lots of the same drive in big single batches.

    Meanwhile, they’re trying to run a business, so overpaying for drives or retiring them early just to keep the drive failure rate down is a waste of money. It was only a few years ago that they had a hard time buying enough drives (at any price) to meet demand for their service. They’re trying to be a low-cost leader (unlimited storage for flat pricing) so they need to cut costs where they can. So long as they don’t lose customer data it doesn’t really matter how they go about it. They solve the problem with redundancy so they can use the drives all the way to end of life.

Resource Links: