This week the Hot Interconnects conference kicked off with a keynote by John Roese, VP and General Manager of Futurewei, Huawei’s North American R&D organization. After the talk, I got a chance to catch up with Roese and ask him about the keynote and where Huawei is headed in the HPC space.
Note: You can watch Roese’s keynote over at inside-Cloud.com.
insideHPC: At the start of your keynote today, you described Huawei as “the largest company no one has ever heard of” with $32 Billion of telecommunications revenue in 2011. What was your message today for these engineers at the Hot Interconnects Conference?
John Roese: There were a lot of things, but the big message was that with all the things going on out there, whether they be cloud, virtualized services, or mobility, these are all things that we’ve seen before. Given our audience with a lot of deep expertise here, the fundamental point I wanted to make them was the they need to recognize the patterns. We have tried to solve these problems before. We have tried distributed environments. We have gone towards making the network transform itself, successfully, many times.
Through each of these evolutions, we learned lessons. We’ve learned that QoS when it’s overengineered doesn’t work, so make it simple. We’ve learned that no matter what you think about the predictability of the outcome and how the network is supposed to operate, it never really operates that way.
We’ve learned that if you make assumptions about symmetry and how traffic is going to flow or how a surge is going to be used, it usually reverses itself midstream and the water flow comes back at you. It surprises you.
So my number one point was to get these people, who are working on some really important things right now, to pause for a second and remember the lessons that they’ve learned so that we don’t repeat our old mistakes in the Cloud era.
insideHPC: So would you agree with the notion that there are no new problems, just new engineers working on old problems?
John Roese: I totally agree with that. For this audience at least, everyone has been in the industry for 10, 15, or 20 something years, they don’t have that excuse.
insideHPC: Earlier today you mentioned working with the folks at CERN on openlab. What are your thoughts on HPC at Huawei?
So obviously HPC is a technology area that’s used in many industries, whether it be large-scale data processing, research environments, and many different places where you use those techniques. The three main characteristics of an HPC environment are massive distributed compute capacity, massive storage capacity that is, in general, very cost-effective at enormous scales, and a huge amount of bandwidth consumption.
In the Huawei world, we don’t see HPC as a target market that we would building technologies exclusively just for that. Everyone of those things I just described could be applied to 10 other market domains. Let’s just take a look at those three.
For the massive network capacity, whether it be HPC or the backbone of a carrier or the core of a cloud infrastructure, we just launched a whole set of products that are 96 ports of 100 GigE on a single device. Why? Because people want a lot of bandwidth. When we give them a bandwidth step, they use it all.
We’re pretty excited about those types of technologies in the HPC environment. It gives you a low-cost, high-capacity, different bandwidth step and extremely high density in a single platform.
insideHPC: What about on the Compute side?
John Roese: On the compute side, we’re moving to rack scale servers. These are very in interesting in a number of areas, but in HPC, they are designed to solve the issues of power and cooling. They aggregate power and cooling across the rack as opposed to the individual node, which solves the intrinsic problems you have when you have thousands of compute nodes in a datacenter to do HPC.
And then on the storage side, some of the most interesting work we are doing is with CERN and openlab. We are developing technology that will dramatically change the scaling and the cost of storage. So the principles we are using are:
- Move away from enterprise-class storage architectures to what is essentially consumer-grade storage. With consumer drives, you can reduce the cost of storage as much as possible.
- Eliminate RAID and shift to a software resiliency architecture, a distributed architecture.
- Get away from complex legacy file systems and move to an object-store environment. In terms of being able to find your data, move to a DHT model in terms of how you actually organize the system with potentially millions of drives connected over millions of nodes in a highly distributed architecture.
The net result is you get a system that can scale to 100 Petabytes or greater, and has an economic threshold that’s significantly less than anything than an enterprise-class storage environment can deal with. Why do you need that? Well, if you’re in a place like CERN, they actually want to hold on to everything forever.
Can you imagine at CERN if the event that discovered the Higgs Boson particle got lost because you only had one copy or you ran out of storage? Those scenarios are not really feasible.
So what they want is almost infinite storage, extremely low cost, and the ability to scale to the size of their experiment. So we’re talking about extreme bandwidth and multiple versions of the same data and they don’t want to have to think about the boundaries of the storage system or limitations on the experiment. And we’ve got to do that with new architectures for storage.