insideHPC: Ken, I think we should start at the beginning. What’s happened since we last talked?
Ken Claffey: So one of the things that we’ve really been very proud of, in terms of our progress, particularly in EMEA over the last 12 months, is we’ve deployed a number of really significant systems. If you remember when we were back together actually at SC15 in Austin. One of the big pieces of news that we were very proud of was our presence in the top 10, 4 of them are actually powered by Seagate. Even more impressive is that 100% of the newest systems are powered by Seagate. When you peel that layer back just a little bit further, actually three of those four systems are actually from Europe and the Middle East.
So, we have Kaust in the Middle East, we also have HLRS, and the Swiss Super Computing Center. On top of that we’ve also done major deployments at CEA in France, who were an earlier adopter of our ClusterStor L300, highest performing Lustre platform, which also used our HPC drive, which we also announced at SC15.
insideHPC: So, Ken, you know certainly with Lustre being a strong presence in the TOP500, but having Seagate there in that top 10, the best of the best, is being the vendor of choice sounds like for these ultra-scale machines.
Ken Claffey: Yeah, you know if you look at it, I think no other vendor, from a storage perspective, has more than a single system, and Seagate has four. And recently, I just noticed in the updated top 500 list this morning, was Stampede, which was previously number ten, just dropped off the top ten. One thing that you may not have picked up in the news just about ten days ago or so, and TACC actually announced Stampede 2, which I’m sure will put TACC back into the top ten when it comes in next year. And actually, Seagate was also selected for that.
insideHPC: Okay, so maybe we can switch gears here a little bit and talk about hpc challenges in general. What are your customers telling you about their big challenges out there for storage?
Ken Claffey: Yeah, from a macro-level perspective, the challenges that we see from HPC users really mirror what we’re seeing in the broader enterprise. So, taking it from a macro-Seagate view for a second, really what we’re seeing is this data deluge, where across all segments of the IT market fundamentally, you’ve got more and more data being generated. And nowhere is that more evident than in HPC, of course, where you’ve got supercomputers generating massive amounts of data. And it’s really signaling the movement away from a computer-centric model to a data-centric model, which I’m sure that’s something that you recognize. But when you’ve got that growing amount of data, it’s all against the backdrop of, “Hey, hang on a second, my budget is actually pretty challenged.” You’ve got storage becoming a bigger and bigger percentage of the budget, against compute, “My budget’s getting pretty compressed.” So, how do you find that balance with, “Hey, because of the data deluge I need massive amounts of capacity, but I also need a lot of performance, and there’s a lot of confusion amongst end-users as to what is the best architecture.
I don’t think us as vendors, frankly, are helping that situation right now. I think we’re actually creating a lot more of that confusion. On one hand, customers are being told Flash solves every problem, but you know Flash’s key attribute is it’s really good for certain aspects of performance. Of course, if your applications can take advantage of that, it is not necessarily so easy but, that still doesn’t solve my capacity problem. Ultimately for customers to solve this data deluge problem, they need more performance, more capacity, reliability, and security– all the other things that matter to HPC customers. They’re going to need the right combination of the right technology – the right tool for the job – and that’s really some combination of Flash and some combination of HDDs. In fact, from a Seagate ClusterStor perspective, from day one we were shipping with a hybrid architecture. So every building block of our architecture that we shipped always had a combination of Flash and HDD. So while that may be appearing to be a new architecture from other vendors, that’s part of the reason that we’ve been so successful in the top 10 and, indeed, the top 500.
insideHPC: So Ken, I’m hearing a lot about all Flash arrays. As you know, here at ISC we’re all about supercomputing – the very fastest. Why not just throw all Flash at the thing and as much money as possible. Why not just go that route?
Ken Claffey: I think you said the key words there: “as much money as possible.” What users are really struggling with is this; on one hand, I’ll build a system for a Flash. My budget is challenged, my budget’s constrained, and also by the way, I need lots of capacity. So, if you have unlimited budget, you’re probably going to build a storage architecture 100% Flash. And indeed, we have done that for some customers.
However, for the majority of customers, they’re going to need to find a much more efficient architecture. The starting point is, if I’m buying storage and capacity, let’s make sure I get as much performance out of the HDD systems that I’m buying. That makes perfect sense, right? You’re going to buy them for capacity anyway, let’s get as much performance out of that as possible. And then, why don’t we supplement that with various amounts of Flash, depending on what my workload is. Because in some cases, adding Flash there may not actually give you any benefit at all. It’s really important to understand the overall supercomputer architecture, the budget constraints, how much capacity, and then you can look at the workload, and say, “Right, will Flash give me a benefit or not?”
insideHPC: Well great. Being here at the ISC conference, what do you got looking forward as you talk to customers this week?
Ken Claffey: Basically, Rich, we’re starting to have early conversations with customers as they look towards the end of this decade and start really thinking about exascale. This is an area where Europe is actually at the forefront. And Seagate heavily involved in that. We were actually awarded the exascale IO contract as part of the Horizon 2020 program. So that project is specifically called Sage, where Seagate has brought together a broad ecosystem of partners, ranging from end-users, application vendors, ISPs, CPU vendors, interconnect vendors, and server vendors. We all do it together as we look at the challenge of exascale within the context of that data-centric model.
And you know, when you think of an exascale, it’s not just in terms of scale, in terms of horizontal scale, hundreds of thousands, maybe millions of compute cores, it’s also the vertical scale from a storage perspective. We got to be careful that we don’t add too many storage layers. As you know, we talk to customers. We always struggle with just backing up the tape when you start thinking about adding more and more layers, and the memory hierarchy, plus high performance disk and tape.
We’re really thinking about as part of a key premise of Sage is what we call rampant layering violation. How do we reduce – not increase – but reduce the number of layers? Instead of having that storage architecture made up of discrete pieces of software stacks, how do we bring that together into more of a homogeneous environment? Because as you know, data movement is both expensive and extremely challenging. Another aspect data movement is – how do you not just take the data and ship it to the supercomputer? How do you take advantage of, more and more, the processing that now exists within these storage systems, and actually do some of the processing closer to the data. So a lot of these conversations we’re having with our customers are shaping that path towards exascale.