In this special guest feature, Mike Bernhardt from Intel discusses the importance of code modernization with PNNL’s Karol Kowalski, Capability Lead for NWChem Development.
Mike Bernhardt: NWChem is an open source high performance computational chemistry tool developed for the Department of Energy (DOE) at Pacific Northwest National Lab (PNNL) in Richland, Washington. Can you give us a recap of NWChem’s history?
Karol Kowalski: The NWChem project started around the mid 90’s. It is the DOE’s premiere code for computational chemistry and from the very beginning the code was designed to take advantage of parallel computer systems, ranging from conventional workstations to leadership class computers at the supercomputing centers. So the whole idea behind NWChem is to provide a scalable solution to tackle really important and complex problems in computational chemistry. NWChem can tackle ground and excited-states properties of small molecules, large molecular assemblies and condensed-phase systems in a complex environment with methods which provide a high level of accuracy.
Mike Bernhardt: You say it was designed for parallel computing systems. How has the design fabric changed for manycore since the original design 20 years ago?
Karol Kowalski: The notion of High Performance Computing is evolving over time. So what was deemed a leadership class computer five years ago is a little bit obsolete. We are talking about the evolution not only in the hardware but also in the programming models because there are more and more cores available. Orchestrating the calculations in the way that can effectively take advantage of parallelism takes a lot of thinking and a lot of redesign of the algorithms behind the calculations.
Mike Bernhardt: What do you think of the phrase “code modernization”?
Karol Kowalski: I would prefer code development. It’s a kind of natural progression. You can of course view it as a modernization but we look at this as a continuous process of keeping up the pace with the progress in the hardware. Modernization is ongoing.
Mike Bernhardt: What is the biggest challenge you face in the optimization of the code as you move into manycore computing?
Karol Kowalski: Our biggest challenge is restructuring the algorithms behind our codes because what is working well on homogenous architectures, might not be very efficient when used in the context of heterogeneous computation and this requires a thorough rewrite or rethinking of the underlying algorithms.
Mike Bernhardt: Can you give me a sense of how big and complex the NWChem code is and how many computational staff are working on it?
Karol Kowalski: Since going open source in 2010, NWChem code has been downloaded more than 50,000 times worldwide, so not only in the U.S. but in Europe and other continents as well. The program contains seven million lines and we have modular extractions of the code. Each module deals with a different methodology which can tackle different problems. The beauty of this modular extraction of NWChem is the fact that those modules can talk to each other. So you can design pretty complicated workflows, which I would dare to refer to as a kind of virtual laboratory. You can literally design virtual chemical experiments.
Mike Bernhardt: You presented a paper at SC14 around the NWChem code and recent achievements. Can you tell me what that is about?
Karol Kowalski: The paper documents our effort to make very accurate methods faster and applicable to bigger systems. We see remarkable speed ups in the numerically intensive parts of those methods. I’m talking in particular about this coupled cluster class of methods, the CCSD(T) method is currently the driving engine behind high accuracy simulations. So this was a high priority for us to enable this code for our users to let them use all the resources of the Cascade system at PNNL.
Mike Bernhardt: How important have the Intel Xeon processors and Intel Xeon Phi coprocessors and Intel architecture been to that process?
Karol Kowalski: It’s important, and we see the change in the heterogeneous computing in general. The architecture offers a huge numerical, I mean huge resource to speed up our codes. To be more precise, all of sudden, methods which were deemed to be too expensive for a certain class of systems can be applicable to more realistic systems and processes thanks to the high performance computing system and heterogeneous architectures. The heterogeneous architecture provides enough flops to bring those very expensive jobs to completion.
Mike Bernhardt: What do you mean by heterogeneous architectures?
Karol Kowalski: There are of course several main players right now. Intel is among them. We show in our supercomputing papers that we are able to take advantage of over 60,000 cores with the concurrent utilization of the Intel Xeon Phi coprocessors.
Mike Bernhardt: Going back to the big picture, why does NWChem matter?
Karol Kowalski: This would have to go back to our user base. NWChem has evolved over years to be a major community code. There’s a large number of users worldwide using NWChem in computational chemistry simulations. Even locally I see NWChem playing a more and more important role in building the synergy between the theory and experiment at PNNL.
Mike Bernhardt: So PNNL with EMSL is one of the few facilities where scientists can match up computational work, theory, and experiment with the systems that are required to give that balance – which is an environment that is somewhat unique?
Karol Kowalski: That’s correct. One of the ongoing projects is geared towards building those workflows. So the user can come here, run the experiment, run the calculations and at the end of the day have two sets of data for comparison and for building an understanding of the processes they are looking at. So this new cutting edge functionality will be available pretty soon. We are very happy that this has materialized.
Mike Bernhardt: Can you give me an example of something that might have been done 10 years ago with NWChem and how it’s being done today?
Karol Kowalski: My role in NWChem is to develop high accuracy methods for ground and excited states and we have recently demonstrated that those methods can deliver very accurate predictions for excited states and properties of molecular systems. At the same time, we’ve shown how time to solution can be reduced by using really massive parallel systems composed of over 200,000 cores. Scientists are driven by curiosity but they can be impatient. Instead of waiting for weeks or months or years, the scientists prefer to get answer in hours. This drives the NWChem development.
Mike Bernhardt: The theme at SC14 is “HPC Matters.” How would you describe this?
Karol Kowalski: We develop predictive models in chemistry and they allow us to tackle more and more complicated problems and processes. This wasn’t possible say ten years ago. We can run significantly bigger calculations and we can also look at the dynamical process at much longer timescales. The problems we are solving here are of national interest, related to environment, to waste management, biochemistry, biology, material science, all those research areas are vital for the economy and also for the nation.
Mike Bernhardt: So you believe HPC can be a key factor in improving the quality of life?
Karol Kowalski: HPC will definitely, radically change the landscape of computational chemistry as we know it now. I also personally believe that HPC will stimulate the development of new theoretical approaches, novel theories for dealing with really complicated systems, very challenging systems because current challenges sometimes require an unprecedented level of accuracy. With HPC we can seriously think about the new generation of methods. And we can validate those methods and quantify those methods using supercomputers. There’s a lot ahead of us and I believe there are still very important collective phenomena in the molecular world, in solids and materials which we are not aware of, and high performance computing will play a very important role in future discoveries in this area.
Mike Bernhardt: What are you most proud of?
Karol Kowalski: I am most proud of my team here at EMSL and PNNL and what we have achieved the last ten years.
Mike Bernhardt: Is there anything you want the rest of the HPC community to know about NWChem or the work here at EMSL?
Karol Kowalski: NWChem is this area of computational chemistry that is the nexus of several important things: theory development, novel computational algorithm development, and new cutting edge applications in science domains. To cover all these areas requires a lot of effort. We would be happy to collaborate with whoever is interested in contributing to NWChem.
In this video, PNNL scientists discuss the importance of code modernization in their quest to drive leading-edge science.