Scaling HPC at the Jülich Supercomputing Centre

In this special guest feature, Tim Gillett from Scientific Computing World interviews Norbert Attig and Thomas Eickermann from the Jülich Supercomputing Centre about how JSC is tackling high performance computing challenges.

SCW: What separates JSC from the other Gauss centres?

Norbert Attig and Thomas Eickermann: For many years, we have been dealing with highly scaleable systems. Within Gauss it was decided that we should have different architectures at the different centres, and from the very beginning we decided that we should concentrate on scaleable systems.

The other things that distinguish us is our unique support environment. We were the first to become aware that allocating a single adviser to a particular project was not helpful when you look at scaleability, community codes, and so on – you need a structure that retains knowledge about a specific project over a longer period of time. This is the reason that we changed our whole support concept in 2007, and came up with the idea of the similar, or simulation, laboratory.

While the Simlab is a unit that consists of HPC experts working together with technicians, these HPC experts have a knowledge of one specific discipline. They are working in one particular field, so they are able to provide support for their community, while also doing research in this field.

We started with plasma physics in the very beginning, we also have fluid and solid engineering, biology, molecular systems – a whole set of Simlabs that came over time and were evaluated by the community. Our research areas are around the basic natural sciences: chemistry, biology, engineering – and over the last couple of years it turned out that users from earth systems science and also from neuroscience joined our user community to make use of our systems.
From the past we also have very strong links to the quantum chromodynamics (QCD) community and in the beginning the Blue Gene was seen primarily as a QCD machine – but it turned out, with our first test, that it should not only be dedicated to this sort of community because many other communities were clearly able to take advantage of this machine.

The other distinction from the other two Gauss centres is that we have no formal connection to a university; we are focused almost completely on supercomputing, rather than supporting a university IT department – though, of course, we do have some university links.

SCW: What drove the decision to use IBM Blue Gene for the main HPC system?

Norbert Attig and Thomas Eickermann: We initially took up the Blue Gene technology in 2005. At that point in time, most of the large supercomputer centres employed IBM Power 4 systems and so did we – and it was a very successful series of systems – but what became obvious for us was that this would not be suitable architecture if we wanted to move towards successful, highly-scaleable systems because of concerns over price/performance and energy consumption/performance. In that respect, we had come to a sort of dead end and we were looking for alternatives.

Then, in 2005, the first Blue Gene system was announced by IBM and we just took the opportunity to buy the smallest system you could get. We tested it and it was surprisingly successful. In several steps we extended that system and also went through the different Blue Gene generation changes, up to the Blue Gene Q, which we have now. By growing slowly, we were also able to take our universe with us; it was a continuous process of change.

Power 4 was a very powerful system but consumed a lot of energy, whereas Blue Gene was the opposite – the processors were individually a bit slower, having less memory, requiring more of them to be installed on a single rack. Users had to adapt to that; to rewrite their code in a way that could live with less memory per application and explore a higher level of parallelism, scaling up to roughly 450,000 cores.

Not every user is able to follow that path because not every application is suited to that; therefore we also need a more general-purpose computer to suit the needs of users who are not able to scale their codes to that extreme level.
Could you tell me about the HighQ Club? This seems to really tie in with the concept of highly scaleable systems…

The HighQ Club is designed to get an application across all 28 racks of the Blue Gene Q system. The initiative has been running for about three years and we have been very surprised by its success. We thought we had nothing to offer and were not able to give a prize; we could only tell users: ‘If you have an application that runs on all 28 racks you can become a member of this club.’

This offer seems to be so attractive that we are overwhelmed by applications for our workshops – where, within a week, we try to bring people up to the standard required for high-end scaling applications. So far we have in the order of 25 applications in the club – that’s 25 applications that have more or less used the full capacity of the Blue Gene system.
Of course, the name of the club is closely related to Blue Gene Q, so the question is what we will do as we know that the

SCW: Blue Gene line is going to stop at Q! It’s not clear at the moment. Will the JSC continue to invest in IBM for new supercomputers in the future?

Norbert Attig and Thomas Eickermann: The key point, of course, is that we are usually doing open procurements so we cannot say that we will definitely invest in IBM or any other specific company – this does not mean that we are arbitrarily pricing whatever is on the market but we have to consider that, for our users, its a huge job to port the code from one architecture to another.

We will try to achieve some form of continuity in terms of architecture, but that doesn’t necessarily mean a continuity in terms of vendors. If you look at our past, you’ll see that up to 2004 we were using Cray systems (at that time Cray was more or less a synonym for supercomputing), then in 2004 we started with IBM. However, IBM was not our only choice. The main reason we have stayed with IBM for so long is the uniqueness of the Blue Gene Q. That’s no longer the case – of course, we are still in touch with IBM but we couldn’t say that we will be going with them into the future.

We are a member of the OpenPOWER Consortium but it’s not a natural continuation because programming a GPU is quite different from programming a Blue Gene. Also, you have to choose from what is actually available in terms of technology!

Norbert Attig is deputy head of JSC; Thomas Eickermann is head of communication systems at JSC.

This story appears here as part of a cross-publishing agreement with Scientific Computing World.

Sign up for our insideHPC Newsletter