In this special guest feature, Intel’s John Hengeveld describes the effort that went into building the fastest supercomputer on Earth.
Milky Way 2 is amazing.. and whats more amazing is… you can build a similar system for your needs.
So by now you have all heard about the MilkyWay 2 system in China that has surprised the world and achieved the #1 spot on the TOP500 list. The Intel team has been working on many fronts in the past year. The upcoming Intel Xeon Processors E5-2600 V2, a major expansion to the Intel Xeon Phi products family, expanding our industry leading software development toolkits. We’re in like 98% of the new systems on the TOP500 list.. so seriously.. we need a rest.
All of these things are coming together at ISC13. The outgrowth of much of this work is shows up in the MilkyWay 2 system, but what’s more important is that this technology is very soon available from a broad range of suppliers to be turned on a wide range of industrial and scientific technical computing applications.
The Processor and Coprocessor components Intel proposed and shipped in production for MilkyWay2 are being announced and demonstrated by Intel for the first time at ISC today. The first demonstration of the as yet unlaunched Intel Xeon Processor E5-2600 V2 product family will be shown in the Intel Booth in a live 52 node cluster showing high fidelity visualizations of an Audi RS5 vehicle design.
During ISC’13, Raj Hazra announced the general availability of 5 new Intel Xeon Phi Coprocessor products including the 3100 family that is featured in the MilkyWay 2 system. You can watch the video right here on insideHPC.
I want to tell a short story about the MilkyWay 2 system that you probably haven’t heard, and show why being at Intel is the coolest thing ever.
About a year ago, the NUDT folks, led by Professor Liao, had a really good idea of how he wanted to build the worlds biggest super computer, using the next step in their proprietary fabric and trying to use intel’s latest processors to achieve its objectives in power efficiency and performance.
They gave Intel and others programmability, node power and node performance goals with very tight constraints. Then they dropped the heavy challenge… “and it all has to work and be #1 in the world by June 2013”. Intel proposed a solution that not only met all their time and performance requirements but also their programmability requirements.
What they wanted they later described as a “Neo-Heterogeneous Architecture”. This type of system has two tiers of hardware heterogeneity, but driven off a consistent programming model and parallelism abstraction. This allows very much faster development of applications that scale to a very high level.
What the use of Intel Xeon Phi Coprocessors offered is hardware with the performance and energy efficiency required, but removed the need for an alternate programming model for the second tier thus enabling their neo-heterogeneous architectural vision.
The plan to build out the system as audacious as well. The customer developed the system based on Intel Xeon Processor E5-2600 based products and built test cluster that would itself be on the top500 list. After debugging that system and the code that ran on it, NUDT planned to get the next generation product (the future Intel Xeon Processor E5-2600 V2 family) and basically drop it in. This enabled them to go from first delivery of a blade to a completed system in about 4 weeks. The Linpack run came about a week later. By the time Jack Dongarra and others saw it at the end of May, it was running real applications.
For Intel’s part, what Intel’s factory and engineering teams did was validate two new products and put them into high volume production on schedule to within a week of our schedule predicted a year before.
I have been getting asked why Intel got chosen for this amazing system. The short answer is.. because we deliver. We delivered.








The Intel team has been working closely with end customers to get the first Xeon Phi systems up and running, including an innovative effort from Glenn Brooks from NICS to build up an extremely efficient system using Intel Xeon 5110P’s while also carefully managing Xeon power. The result was the #1 power efficient system on the
In the past year, Big Data has emerged as a premier investment in business and academia. The use of HPC in the analysis of Big Data and how Big Data technology is going to evolve beyond Hadoop is going to be a major topic of discussion in the sessions and in the industry. How will storage change? How will compute change? How will this increased data bandwidth requirement be reflected in emerging interconnect models? I expect to find answers to these questions at SC12.
The top 10 supercomputers will be very interesting this time around. There has been relatively little change in the past 2 lists in the top10. It will be fascinating to see if there is a lot of change. How high up will the Titan monster go? What efficiency will it achieve? What other new systems will there be in the top 10? One very well informed person said to me in Hamburg “This top500 list is the last gasp of the dying blue gene architecture…” Is he right? Will BlueGene resurge? Or will hybrid architectures begin to retake a leadership role?
But it turns out there is a high likelihood that in the relatively near future Big Data and high-performance computing (HPC) might work together to unravel the mysteries of rare cancers like mine—and offer new hope to people like me.
So the GPU vs. MIC debate is engaged in full force. NVidia and Intel are now mostly publically aligned on the goal, 20MW / Exaflop in this decade. The debate on performance is over; the debate on programming has begun.

In a few weeks, Super Computing 2011 (SC11) will be in Seattle. I live in Portland, Oregon, so this is basically next door. I love Seattle. I love the flying fish, I love the Mariners (yeah I know…My life is happy and I need a little pain for balance) but I especially love the Museum of Flight at Boeing Field. I love to be there among old air force ones, a blackbird spy plane, and vintage aircrafts of all sorts. My son used to think the Museum of Flight was the finest place in the world. Now, he thinks that’s a sound studio in Southern California but I digress…
This year’s SC11 looks at Data Intensive Science (DIS) as the primary thrust and I anticipate some great papers from it. DIS is one of the areas that strains supercomputing architecture as we look forward to the exascale era. Massive amounts of data exist in health and bioscience that can be brought to bear to see new patterns and new connections. My favorite example is the work (shown at IDF) by David Patterson (Berkeley) and David Haussler (UCSD) on the 
How have machines like 



