Despite their incredible capabilities, today’s supercomputers typically only have three years of operating life before they need an upgrade. With the march of Moore’s Law, faster, more efficient systems are always waiting to replace them.
A novel program at Stanford is finding a second life for used HPC clusters, providing much-needed computational resources for research while giving undergraduate students a chance to learn valuable career skills. To learn more, we caught up with Dellarontay Readus from the Stanford High Performance Computing Center (HPCC).
insideHPC: Can you tell us about yourself and the cluster program?
Dellarontay Readus: I am a Senior Software Engineer with the Stanford High Performance Computing Center. I am also an undergraduate student studying Computer Science and concentrating in Artificial Intelligence. The HPCC acts as a work experience program where students are immersed in cluster building exercises. Our staff is mainly comprised of 10 undergraduate students from Stanford University. We are a self-sustaining student-run organization that generates revenue by providing access to compute clusters to recover operational costs for wages and equipment. Some 95% of our revenue is generated by offering compute cycles to federally sponsored research programs. We continue to update our cluster infrastructure as an academic service center through a long track record of successful federal grant proposals. Through these grants we obtain new systems, clusters, and parts to provide compute cycles to the academic community.
insideHPC: Can you tell us some configuration details about your HPC cluster and where the components came from?
Dellarontay Readus: We are having a celebration on February 14th for the shutdown of a legacy cluster, Certainty. This cluster was a part of a $6 million dollar proposal submitted to the NSF MRI-R2 program funded by the American Recovery and Reinvestment Act. Certainty was a hybrid CPU-GPU cluster consisting of 600 compute nodes, InfiniBand interconnect, and high speed parallel storage.
Our new cluster replacing Certainty is a hybrid of Shepard and Yellowstone clusters. These clusters originate from separate federal grant proposals valued at a total of $15 million dollars. The new clusters, approximately 1,200 nodes, once fully deployed will lead the HPCC’s technological infrastructure for the next few years.
insideHPC: Can you tell us more about your users and the types of research they’re doing?
Dellarontay Readus: Our users range from researchers working on projects funded by federally sponsored grants to other academic researchers primarily in the field of computational fluid dynamics. We also work with faculty, students, and nearby communities through outreach programs to provide access to modern compute platforms. Through ME 344, a course taught at Stanford, students are given direct experience with cluster building, system administration, application deployment, and a variety of tools to assist with building HPC systems.
insideHPC: What have you already learned in terms of HPC job skills by participating in this program?
Dellarontay Readus: In the five years I have had the privilege to work with the HPCC, I have worked with and completed cluster builds with various configurations, platforms, and hardware. I’ve had the opportunity to get familiar with bash, UNIX, and Linux as I began to work through troubleshooting steps in those very same builds. One of my most valuable experiences was assisting in teaching ME344 the second time as I was able to help students far more confidently than ever before, and impact one student so much that they wished to work with the HPCC. I also had a chance to work on the other practical long-term aspects of maintaining an academic service center. Once in the Spring when one of our clusters began to overheat and shut down, I took part in the process to remove nodes and reapply thermal paste to the Intel CPUs. When the first successful proposal I took part in for the Shepard cluster was accepted and the system came onto campus I was one of the students pushing the servers off the truck and into the building.
insideHPC: Can you give us a preview of your upcoming presentation at the Stanford HPC Conference in April?
Dellarontay Readus: HPCC students will present on the deployment of our new systems.
insideHPC: What’s next? Are you looking for more equipment donations, tutors, and participants?
The Stanford HPCC is always looking for more sponsorship, direction towards possible grants for cluster compute technology, or more visitors to attend our annual conferences and workshops. Personally I would appreciate additional sponsorship for NVIDIA whether through pre-built systems or compatible GPU components as I believe that would benefit the academic community.
In related news, registration is now open for the Stanford HPC Conference, which takes place April 21-22.