In this special guest feature, Intel’s John Hengeveld looks ahead to Data Intensive Science and other coming attractions at SC11.
In a few weeks, Super Computing 2011 (SC11) will be in Seattle. I live in Portland, Oregon, so this is basically next door. I love Seattle. I love the flying fish, I love the Mariners (yeah I know…My life is happy and I need a little pain for balance) but I especially love the Museum of Flight at Boeing Field. I love to be there among old air force ones, a blackbird spy plane, and vintage aircrafts of all sorts. My son used to think the Museum of Flight was the finest place in the world. Now, he thinks that’s a sound studio in Southern California but I digress…
There is a 2-axis flight simulator there. My son went in, got twisted, turned around and upside down as he flew around a simulated Seattle in his simulated Jet. Riding with him, I was thrilled (and maybe I was a teensy bit nauseated).
These things are great fun but it doesn’t take a lot of compute power to create THAT immersive 3D experience. It takes a lot more to drive a 3D world, or render a 3D movie.
When I was on the SC09 committee I got hooked on the technical content of the conference quite deeply. Our conference had a thrust on supercomputing’s role in 3D. I observed first-hand how the conference committee does a great job of bringing forward papers with real meat to them. No marketing fluff – real innovation.
This year’s SC11 looks at Data Intensive Science (DIS) as the primary thrust and I anticipate some great papers from it. DIS is one of the areas that strains supercomputing architecture as we look forward to the exascale era. Massive amounts of data exist in health and bioscience that can be brought to bear to see new patterns and new connections. My favorite example is the work (shown at IDF) by David Patterson (Berkeley) and David Haussler (UCSD) on the study of cancer genome mutations.
The cool thing about this DIS is that it hits cloud, new compute architectures and new storage architectures in one shot— that’s three hot topics in one. A broad thrust in the scientific community will twist and turn the HPC area almost as much my son and I twisted and turned in that simulator.
Cloud as a means of data storage and search creates great opportunity to bring together large quantities of public data. Innovative, new compute architectures facilitate understanding of this data at a high bandwidth (like Intel’s MIC architecture or GPU). Along with new storage architectures that make finding relevant data efficient, allowing analysis can proceed efficiently. DIS taxes compute bandwidth, memory bandwidth and IO bandwidth but does so in a balanced way. There are so many examples of potential applications that recoding for each will be a problem. It will be interesting to see how the different potential architectures position their solutions for the space. You can find more at the Nature.com blog on Data Intensive Science.
The last step of the DIS world is the democratization of access. Until a broad range of researchers can use the data that is publically available, the rate of breakthroughs will be slow. This is another example of why standards in HPC access simplifications are needed.
My son loves the physical sensation of flying “Immelman” — you climb, pull a half loop, then flip yourself over and you are flying in the same plane in a different direction but at a much higher altitude. DIS is kind of like that – I’m not sure, but some folks might get queasy.