Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:


Henry Newman on Why Tape is Dead

In this video from the Nov. 17 Seagate User Group at SC14, Henry Newman from Instrumental presents: Tape vs. Disk 2015-2020.

Transcript:
So, I’m the last speaker between you and the opening gala event so I will talk quickly. Ken heard about a study we did for a government agency looking at long term archival issues and what should be the archival technology of the future. I think Gary brought up a really good point. I kind of had a discussion with him and I think it was about five years ago that was sponsored and he said, “I’m going to erase your codes, tape is dead.”

And I argued with him, but Gary was right but it’s really moving pretty quickly. How do we move this? Is it off, on? Can you advance? All right, thank you. So let me give you a background and why I have come to this conclusion. I’ve worked for 25 years on HSM’s and I love the HSM’s and I love tape. And I’ve worked at… things going back to CFS and UniTree, DMF, SAM-QFS, HPSS. Name your archive system, I’ve worked on it, just about.

I have written about tape and tape’s advantages over disks for 15 years. I used to get a lot of feedback from Seagate people that I should listen to them about discs. But I have come to the conclusion that in the next few years, we’ve got a problem and I’m going to put a challenge to Seagate on how to deal with it. So tapes’ market is shrinking. It’s about investment. There’s a lot of money in disk. We talk $30 billion. I think that it’s a little higher, about $36 billion. But if you look at the archival market which is what drives tape and drives development and R and D takes money, 2008, you had basically a billion-dollar market plus for LTO. And today, you have less than $500 million or around $500 million.

We’ve gotten to a point, in my opinion, if you look at the technology and the amount of investment needed to move it forward, there’s not enough money and profit to move forward the technology at a rate that will compete with disc. If you look at the big issue with disk right now – and I think Garry brought it up – it’s dollars per gigabyte per second. A tape drive, enterprise tape drive is $25,000-$35,000. You get 250 megabytes out of it. Uncompressed, HPC data, maybe you have some compressible HPC data, you get 800. Disk drive is $450 – $460. I know what Ken’s margins are. Maybe he bumps it to $500 – $600.

I don’t know, Bill, what you paid? But you’re getting about 100– today you’re getting about 175 megabytes a second out of a single disk drive. If you’ve got to move your data in a big data world back from archive, which people do to reprocess it because you don’t know what you don’t know about your data. If any of you remember the 80s, there was the– things between the chromosomes were called junk DNA and I was dealing with people dealing with junk DNA and we now find that that DNA is what’s used for replication.

There’s lots of data out there that you’re going to have to bring back. You’re going to have to figure out what it is and how to use it and it’s going to cost way too much money with tape. Tape has some other issues that are coming, that are becoming extremely costly, and that’s migration costs. You migrate tape from tape to another tape, it takes years off it. This goes back to the dollars per megabyte per second issue. The difference between even LTO tape and disk in dollars per megabyte per second is a factor of five, easily a factor of five. Disk performance is growing faster than tape performance. At every generation, you’re getting at least 20% improvement. Tape performance doesn’t grow that fast.

There’s connectivity and bandwidth issues. You can’t stripe tapes except there’s one software product that allows for that. Bill and his team worked for four, five, six years to get– well, you’re fat. It seemed like longer. But it took– there’s no stripping of tape. So if you have a petabyte file and you don’t have stripping which you only have in one tape product, how do you get the data back? How do you get it back officially? So the tape software ecosystem because of the investment dollar,is not being invested in. Whereas look at the number of file systems that we get, object systems that we’ve got in the last decade, in the last five years.

So it’s about investments and it’s about money. So I’ve written about this for over a decade. Tape is more reliable media than disk. The hard area rates are greater. It’s transportable. There are lots of people I know who ship tapes around the country via Sneakernet and we’ve all heard about FedEx laws and things like that. It has advantages: the longevity. Now, they’ll say that the tape is good for 30 years. We all know that that is actually incorrect. That maybe you can get ten years out of a tape before the interface dies, and you can’t do a read-back. Tape also – except for quite recently – does not have a common on-tape format. A lot of the HSMs are very specific formats, and they’re not transportable, without that HSM you can’t read the data.

So, here’s my challenges to Seagate, to migrate from a tape world to a disk world. You’ve got to improve the data integrity. And you got to tell people what the integrity is. You get a rate, everybody says, “Oh, I’ve got erasure codes.” Well, what about the erasure codes good for? What’s the failure conditions? What are my reliability conditions? Am I good to ten to the twentieth bits, ten to the twenty eighth bits, ten to the three of bits. I’m I good for one disk failure, I’m I good for multiple disk failures. These kinds of things need to be open and disclosed.

Address the tape cost of energies. Now, tape as a media still has some cost advantages, but what’s the cost in a real study of migrating the data dollars per megabyte per second. How fast I need to get it back? If I have a disaster, and I need to get a bunch of data back to figure out how to reorganise the electric lines after a hurricane, after I pull on the UAV and have all the data about it, and I do a linear program algorithm and figure out what’s the most optimal way to put the electric system in. If I’ve got one file on tape and I can read just read just one tape at a time, 160 or 200 megabytes per second, that’s a problem.

You can read why faster petabytes-a-second are coming. We are at terabytes a second now. Address tapes archival advantages. Studies have shown that disk drives failed five years, tapes you can put in the shelf for much longer. How is Seagate going to address this? How is the disk industry going to address to replace the tape? And last but not least, and having a non disk format that’s transportable between systems because if you’re going– there are still people that are going to need to ship disk drives around just like they shipped tape drives around and having a common on disk format at a common media format. So I think I meet my timeline Ken. I’ll entertain any questions. This is a long study. I have a lot more data on this. Anybody who has any questions I’ll be happy to answer them now or later in the week.

View the Slides * Download the MP3Check out our Full Coverage of SC14Sign up for our insideHPC Newsletter

Resource Links: