Jülich Fires Up Eurtotech Aurora for Exascale Research

 

In one of the very first hardware milestone on the road to Exascale, today the Jülich Supercomputing Centre announced that the Eurotech Aurora hot water cooled HPC system installation has been completed.

The 128-node Aurora supercomputer will be used in the DEEP (Dynamic Exascale Entry Platform) project. The DEEP consortium, led by Forschungszentrum Jülich, proposes to develop a novel, Exascale­enabling supercomputing architecture with a matching software stack and a set of optimized grand­challenge simulation applications. DEEP takes the concept of compute acceleration to a new level: instead of adding accelerator cards to Cluster nodes, an accelerator Cluster – called Booster – will complement a conventional HPC system and increase its compute performance.

Aurora is expected to enable unprecedented scalability when it comes online for users in October. The Cluster­level heterogeneity of DEEP will attenuate the consequences of Amdahl’s law allowing users to run applications with kernels of high scalability alongside kernels of low scalability concurrently on different sides of the system, avoiding at the same time over- and under-subscription. An extrapolation to millions of cores would take the DEEP concept towards an Exascale level.



[Read Entire Post]

Video: The Future of MPI

 

In this video, D.K. Panda from Ohio State University presents: The Future of MPI. Recorded at the HPC Advisory Council Spain Workshop 2012 in Malaga. Download the slides (PDF).



[Read Entire Post]

Intel Hints at Weaving Network fabric into Xeons, Atoms

 

By Timothy Prickett MorganGet more from this author

If it wasn’t immediately obvious to you, Intel thinks the future of the systems business is weaving interconnection fabrics onto server processors – thus consolidating yet another component of the data center onto the processor and bringing to bear Chipzilla’s wafer etching process advantages on that unified chip. And, if Intel plays its cards right, giving it a sustainable advantage to keep arch-nemesis Advanced Micro Devices and up-and-coming rivals in the ARM collective.

We used to think of a server as a computer, but now the data center has become the computer,” Raj Hazra, general manager of technical computing at Intel, told El Reg. There is a difference between networks and fabrics, and while there is a place for networks, they lack certain optimizations that fabrics have. Some applications need purpose-built interconnects, and fabrics look at compute and storage nodes as partitioned logical resources rather than as separate units of compute and storage. Problems are becoming superscalar across multiple machines, and that is driving new approaches of adding bandwidth and reducing latencies in that bandwidth. The fabric interconnect has become what was the system bus or processor interface.”

The problem, of course, is that many applications are so big that they cannot be solved in a shared memory system that gangs up multiple processors together in an SMP or NUMA cluster. SMP and NUMA systems pretty much run out of gas after 32 sockets, and there is not much more you can do about it beyond cramming more cores into a socket. Shared memory systems make programming easier because coders don’t have to deal with parallelism themselves – it is done by the processor, the chipset, and the memory controllers that make a moderately parallel machine look more monolithic.

If you want to scale further than SMP or NUMA, you need something that looks more like a modern supercomputer interconnect and the related programming and scheduling tools that are tuned for it.

Everybody in the server chip racket knows they need some sort of fabric interconnect because it is, in effect, the new chipset for scalable computing. And a fabric is more than just a network, which can allow anything to talk to anything. Fabrics are tuned for specific workloads and are designed to deliver predictable performance without jitter and other side effects.

Feel the width
Fujitsu has the very interesting “Tofu” 6D mesh/torus interconnect in its K supercomputer and its commercial variant, the PrimeHPC FX10 clusters. IBM has created a bunch of interesting ones over the years for its Power Systems. Its latest ones are in the 5D torus in its BlueGene/Q supers and the 1.13TB/sec (that’s bytes, not bits) hub-switch interconnect in the Power 775 clusters formerly known as “Blue Waters.”

Silicon Graphics and Cray have their own respective NUMALink and XE interconnects. The latest UV2 shared memory supers from SGI use the NUMALink 6, a substantially improved and more compact design than prior NUMAlink interconnect fabrics and notable because they implement shared memory across a maximum of 512 Xeon E5-4600 nodes. Advanced Micro Devices snapped up SeaMicro for $334m in March to get its hands on the “Freedom” 3D mesh/torus interconnect that bears some resemblance to the BlueGene family of interconnects.

At the other end of the spectrum, the ARM server processors just shipping or in development from Calxeda, Applied Micro, and Marvell all have on-chip networking of various kinds. These vendors are actually blazing the trail by adding switches or routers to compute devices.

Intel can’t let the ARM collective take the lead here, and it has also made no secret that it wants a big piece of the exascale supercomputing market. It has done a number of key, strategic acquisitions to put a very serious stake in the ground. And as we all know, ideas that start in HPC systems often make their way into commercial systems down the road. In this case, fabric interconnects may end up in regular systems a little sooner than many expect – particularly considering the parallel nature of many database, data warehousing, big data systems, and web caching these days.

This is why Intel bought Ethernet chip specialist Fulcrum Microsystems back in July 2011 for an undisclosed sum (probably well north of $100m based on the VC money Fulcrum blew over the years), and then ate the InfiniBand adapter and switch business from QLogic for $125m in January of this year. (In a way, this was a homecoming for InfiniBand, a standard that Intel helped to create along with IBM and that, like Itanium, was supposed to take over the world. It’s tough to beat that x86 instruction set or Ethernet, though.)

The icing on the cake for Intel when it comes to interconnects was the $140m acquisition of the “Gemini” XE6 interconnect and the future “Aries” interconnect, due to be commercialized next year, from Cray. Under that deal, Cray gets to build and sell machines using Gemini and Aries interconnects, but Intel gets the 74 people who know how to make and support them and all the intellectual property behind them. Cray gets three or four years to figure out what it wants to do with itself as Intel basically takes over the core engineering work that Cray did. (Software and services, anybody?)

Both Gemini and Aries are based on a high radix router design, with Gemini being a dumbed down version of Aries that is meant to plug into the HyperTransport links of Opteron processors. Aries will plug into PCI-Express 3.0 ports and therefore not be tied to any specific processor. (Well, we’ll see about that, with Intel owning Aries now.) Gemini was designed to scale to 1 million cores in a single system (not with a shared memory architecture, but just as a high-speed cluster), and Aries will no doubt scale further than that and may even have shared memory features for modestly sized clusters with a few thousand cores.

Plugging the pieces together
With its own Ethernet, InfiniBand, and Cray interconnects and lots of people who know about system I/O and server-side networking, Intel has a formidable set of assets from which it can build fabric interconnects and gradually move them onto the Xeon and Atom dies to make very scalable clusters of compact and efficient server nodes. But don’t think Intel is going for a one-size-fits-all approach.

“The fabric requirements for each segment of the market are not the same, and their bandwidth and latency needs are not growing at the same rates,” said Hazra.

Use cases of fabrics in conjunction with servers

Use cases of fabrics in conjunction with servers
All fabric interconnect makers face the same issues, he said. Customers want scalable link bandwidth and to fully utilize that bandwidth so they don’t waste power. (An interconnect chip can burn as much juice as a CPU, after all, and needs to do the same power gating and data perfecting that CPUs and their caches do.) They want low latency and predictable, deterministic latency at the same time. They also want to be able to scale from thousands of nodes to hundreds of thousands of nodes, and they want the ability to carve a cluster into secure tenants with quality-of-service guarantees. But each customer set has slightly different challenges:

Challenges with modern interconnect fabrics

Challenges with modern interconnect fabrics
Intel wants to rule system interconnect fabrics like it does server processors and increasingly does with storage array processing – and it will also have to rule main and flash memories for systems, too, since these will be integral parts of future exascale computing systems and their commercialized offshoots.

The first step was to get the intellectual property and foundation for various fabrics. The next step will be to integrate fabrics onto server processors and coprocessors like the Xeon Phi x86 compute coprocessor that will come to market early next year.

Intel will integrate fabrics onto future processors
Hazra wouldn’t give out a lot of details about precisely how Intel will integrate fabrics into chips. But there are clearly three options. Intel can move a fabric controller onto the system motherboard beside the CPU socket, as Cray, SGI, Fujitsu, IBM, and AMD/SeaMicro do with their respective interconnects. It can move the interconnect controller into the processor package but have it be a distinct chip from the processor, as was done with early hybrid CPU-GPU processors for laptops. Or Intel can use the transistor budget and process shrink to put the fabric controller right on the CPU die, much as it has moved on main memory controllers and now PCI-Express 3.0 peripheral controllers with the Xeon E3 and E5 server processors.

We haven’t worked through all the details yet, and we certainly have not announced if fabric integration onto the chip will happen in one step or two steps,” Hazra tells El Reg. The packaging and chip processes available in volume at the time that Intel hopes to get fabric integrated with Xeon, Xeon Phi, and server variants of Atom processors will ultimately determine what Intel does. “Be prepared to be surprised,” Hazra says with a laugh.

Intel is looking at a number of processor and fabric interconnect combinations, and will not say which of the three interconnects will end up with what processors. And, finally, Hazra hints: “Don’t assume that we will integrate an existing fabric interconnect, such as Ethernet or InfiniBand or Aries. We are innovating with the fabrics, too.”

That is a pretty powerful hint, and it brings to mind the switch-hitting SwitchX ASICs that Mellanox Technologies has cooked up, which are at the heart of its own Ethernet and InfiniBand switches and can speak either protocol.

Cray already has a software stack that makes the Gemini interconnect look like Ethernet to a Linux operating system, and Intel could go so far as to create an Aries derivative that could function like an Ethernet, InfiniBand, or Aries controller in a server cluster. Or, if that is too ambitious, Intel could do an Aries integration for high-end supers and a converged Ethernet/InfiniBand controller for more generic servers. Oddly enough, a trimmed down Aries interconnect that spoke Ethernet and could scale to hundreds of thousands of Atom cores is probably what web infrastructure companies might want. And could take on AMD’s newly acquired Freedom interconnect.

This is going to get very interesting very fast. Brace yourself. ®

This article originally appeared in The Register. It appears here in its entirety as part of a cross-publishing agreement.



[Read Entire Post]

Video: Hurricane Katrina – 7 Years Later

 

In this video, NASA uses state-of-the-art visualization to revisit Hurricane Katrina, which struck the Gulf Coast on August 28, 2005. Before and during the hurricane’s landfall, NASA provided data gathered from a series of Earth-observing satellites to help predict Katrina’s path and intensity. In its aftermath, NASA satellites also helped identify areas hardest hit.



[Read Entire Post]

GTC 2013 Opens Call for Submissions

 

The GTC 2013 Conference is seeking submissions from GPU industry experts and academia. The event will take place in San Jose March 18-21, 2013.

The submission should be about your work using the GPU for parallel computing or visualization, and can be completed or currently in progress. If the work is currently in progress please provide additional information about when you can expect final results.

Session Submissions are due Oct. 3, 2012. A Call for Poster Submissions will open Oct. 15, 2012.



[Read Entire Post]

Swiss CSCS Moves to Allinea DDT Debugger

 

This past Spring, I was fortunate enough to tour the new CSCS Swiss national supercomputing center, a state-of-the-art facility cooled with lake water. Now CSCS is refreshing their development tools by moving to Allinea DDT.

CSCS hosts Switzerland’s premier HPC systems including the Monte Rosa and Todi Cray XE6 and XK6 machines. With over 50,000 CPU cores and around 200 powerful NVIDIA Tesla X2090 GPU cards on site, developers at CSCS are aiming to achieve significant scientific breakthroughs with their applications.

Allinea DDT meets exactly the debugging needs of our users and allows us to take an important step forward in parallel debugging,” says Thomas Schoenemeyer, Associate Director Technology Integration of CSCS. “Our developers appreciate the unique debugging features such as the excellent parallel stack viewer and other integrated features that allow them to create distinct groups, which are very useful when debugging thousands of processes. The novel sparklines are exceptionally useful for examining data in parallel.”

Read the Full Story.



[Read Entire Post]

IBM Building 3D Chips Like Sandcastles

 

Did you know that datacenters In the United States already consume two percent of the electricity available with consumption doubling every five years? At this rate of growth, a supercomputer in the year 2050 will require the entire production of the US energy grid.

To address this challenge, IBM scientists are researching vertically stacked chips, also known as 3D chip stacks. Under investigation are innovative manufacturing solutions using a natural phenomenon that children around the world appreciate every summer while building sand sculptures — capillary bridging in wet sand.

One main challenges for 3D chip stacks is to keep the transistors at temperatures below 80 degrees Celsius, while also considering the multiple chips dissipating heat to a shared heat sink at the backside of the stack,” said IBM Scientist Thomas Brunschwiler. “Hence, a low thermal resistance under-fill material is required in the space between the chips formed by the electrical connections. Improvements with traditional capillary under-fills have only resulted in moderate thermal performance.”

According to Brunschwiler, this research could eventually lead to the development of supercomputers the size of a sugar cube. Read the Full Story.



[Read Entire Post]

Job of the Week: Computational Scientist at Rice University

 

Rice University is seeking a Computational Scientist in our Job of the Week.

The Computational Scientist will work with Rice research faculty and staff and their collaborators to maintain, develop, and advance our efforts in high performance computing. The position is particularly focused on supporting complex research applications which are running on scalable high performance computing resources at Rice. The incumbent will engage with world leading researchers on the innovative use of high performance computing and interact with faculty from across Rice engaged in computationally enabled research. The successful candidate in this position will be involved in analysis, design, development, porting, optimization and testing of advanced research software-codes and algorithms.

Are you paying too much for your job ads? Not only do we offer ads for a fraction of what the other guys charge, our insideHPC Job Board is powered by SimplyHIred, the world’s largest job search engine.

As a reminder, we are offering FREE job listings for .EDU and .GOV domains, so email us at info @ insideHPC.com for a special discount code.



[Read Entire Post]

Video: Tulsa Supercomputer Coming to City Hall

 

This ONR news report profiles preparations for a new supercomputer coming to City Hall in Tulsa, Oklahoma.

The Tulsa Supercomputing Center along with Tulsa Research Partners is the centerpiece of a broader effort by the Institute over the last six years to build an innovative economy driven by its central theme of “Research to High-Impact Jobs.” The center will serve universities, colleges, research centers, corporations, small business and entrepreneurial growth companies to address computational needs across multiple industries and disciplines.

While system configuration and vendor details have not been released, the supercomputer is expected to come online in November, 2012. Read the Full Story.



[Read Entire Post]

A Peak Inside Oak Ridge and Supercomputing’s Future

 

Over at Datacenter Knowledge, Rich Miller takes a tour of the 18 Megawatt datacenter at Oak Ridge National Labs, home of three of the most powerful Cray supercomputers in the world. He also provides a glimpse of the future.

We envision two systems beyond Titan to achieve exascale performance by about 2018,” wrote Jeff Nicheols, Associate Laboratory Director for Computing and Computational Sciences. “The first will be an order of magnitude more powerful than Titan, in the range of 200 petaflops. This system will be an exascale prototype, incorporating many of the hardware approaches that will be incorporated at the exascale. We hope to scale this solution up to the exascale.”

Read the Full Story.



[Read Entire Post]

Dataflow an Effective Model for HPC

 

Jörg Lotze from Xcelerit writes that the dataflow model can be hugely beneficial to high performance computing.

When asked to describe a data processing algorithm, domain specialists, for example engineers, researchers, or mathematicians, often walk to the white board and draw boxes with different processing stages and connect them with arrows. This effectively is dataflow – and shows that this way of thinking is natural in many problem domains. The dataflow programming model with its ‘shared-nothing’ semantics and explicitly expressed data dependencies provides pipeline parallelism by its very nature (a form of task parallelism). That is, all actors can execute concurrently on different sections of the data.

Read the Full Story.



[Read Entire Post]

Defining Cuda as a Programming Model

 

Nvidia’s Mark Ebersole writes that CUDA is not an API or language – rather, it is a powerful mainstream tool that allows you to easily unlock the power of GPU acceleration.

It is much more than that. CUDA is a parallel computing platform and programming model that makes using a GPU for general purpose computing simple and elegant. The developer still programs in the familiar C, C++, Fortran, or an ever expanding list of supported languages, and incorporates extensions of these languages in the form of a few basic keywords.

Read the Full Story.



[Read Entire Post]

Video: Introduction to Parallel Programming with OpenMP

 

In this video, Tim Mattson from Intel gives a lecture on Open MP basics.

We introduce OpenMP; an industry standard API for programming shared memory computers. OpenMP provides a simple path for programmers to get started with parallel programming. In this lecture, we’ll focus on the core features of the original versions of OpenMP.

This video was recorded at the 2012 Par Lab Bootcamp at Berkeley. Download the slides (PDF).



[Read Entire Post]

A Primer on Computational Fluid Dynamics

 

Kevin Tubbs from Dell has started a series of blog posts about Computational Fluid Dynamics.

Time to final solution with CFD solvers greatly influences the rate at which analysis and design decisions can be made. Because there are a wide variety of methods and uses, HPC components play an important role in the delivery of efficient HPC tools for scientist and engineers.

Read the Full Story.



[Read Entire Post]

Hengeveld: Big Data Meets HPC to Solve Hard Problems and Improve Lives

 

By John Hengeveld

John Hengeveld is the HPC Segment Marketing Director for Intel’s Technical Computing Group.  His Intel Developer Forum session titled “Big Data Meets High Performance Computing” will take place at 3:30 p.m. Wednesday in Room 2002 of Moscone West, San Francisco.

I’ve been hearing a lot buzz about “Big Data” … people talking in terms of mining Facebook posts for marketing data. I didn’t take all the talk seriously at first, but I do now. … Let me tell you how Big Data might just save my life.

In March, I had a major appendix attack. And it turns out that within my appendix was a material called appendiceal mucinous neoplasm, which is a very rare type of cancer.  There is no cure for my cancer—not yet, anyway. I’m just hanging on and crossing my fingers and hoping things work out.

Now, the first time my doctor went over the pathology report, she told me I had a 30-60 percent chance of having less than seven years to live. But then I got some good news from my doctors. After a lot of study and analysis, they offered a more encouraging assessment. They reasoned that I had a better-than-average prognosis after all, given that I didn’t appear to have very much of the material or to have had a lengthy exposure to it. So I went back to work.

But it turns out there is a high likelihood that in the relatively near future Big Data and high-performance computing (HPC) might work together to unravel the mysteries of rare cancers like mine—and offer new hope to people like me.

I like to think of Big Data as an oil field with a lot of breadth and a lot of depth. To get value out of the field, you need a powerful pump, and that’s HPC. The HPC pump allows you to draw insights from the Big Data. Today, researchers are doing just this across a broad spectrum of fields. For me, the research being done in the field of genomics hits closest to home, because this research could eventually lead to a world of personalized therapies based on a genomic analysis of a patient’s cancer.

This is one of the topics we will dive into during a session I will lead Wednesday at the Intel Developer Forum. That session—titled “Big Data Meets High Performance Computing”—will include an appearance by Professor Michael Franklin, a computer scientist who directs the AMPLab at UC Berkeley, one of the leading teams working on applications of Big Data to a new generation of problems.

Professor Franklin will explore some of the latest innovations in five applications that combine Big Data with HPC. These applications range from genomics research to crowd-sourcing to increase battery life on your cell phone (yes, it works—I’ve done it). I, of course, will have a special interest in the discussion of the role that Big Data and HPC can play in helping researchers understand the genetics in cancers and formulate appropriate therapies.

Already, people at Berkeley are using HPC to study the public data on cancer genomes. They have accessed what’s called The Cancer Genome Atlas. This atlas shows the genomics of tumors and their hosts. The study is focused on finding the mutations that have derived the cancers from the hosts, and then using that knowledge to understand the nature of the mutations that are occurring and how they might be blocked or eliminated.

This kind of research is good news—not just for me but for many other cancer patients to come. In this sense, Big Data and HPC provide hope for the future.

From my perspective, Big Data is not about shifting through massive numbers of Facebook posts and seeing who the “likes” are. It’s really about generating insights to solve hard problems and improve the lives of people.



[Read Entire Post]


Video Archive

insideHPC.com is a production of insideHPC, LLC. © 2006-2013 Sitemap