insideHPC Vanguard: LLNL’s Kathryn Mohror — A Passion for Managing Scientific Data

Kathryn Mohror

Kathryn Mohror was introduced to HPC in 2002 as a graduate student at Portland State University, where she studied new remote memory access features of the Message Passing Interface (MPI). She got hooked on achieving the best performance possible on HPC systems.

She joined Lawrence Livermore National Laboratory (LLNL) in 2010 as a postdoctoral research staff member in the Center for Applied Scientific Computing and is now a distinguished member of technical staff and deputy director of the Laboratory Directed Research and Development Program at the lab.

She has held several positions – she was a Data Analysis Group Leader and a valued member of the Exascale Computing Project (ECP) representing software technologies at the National Nuclear Safety Administration. She was recently recognized as an Emerging Woman Leader in Technical Computing by ACM SIGHPC, the Association for Computing Machinery’s Special Interest Group on High Performance Computing.

She has also been recognized with multiple R&D 100 awards.   Kathryn is widely respected among her peers for her collaboration and leadership.

An interview with HPC-AI Vanguard Kathryn Mohror: Managing Scientific Data Produced by HPC

What is your passion related to your career path?

My scientific passion is managing the data consumed and produced by scientific workloads on HPC systems. While many do not consider I/O and managing data to be the most glamorous passion (fast I/O will not get a supercomputer on the Top500 list!), efficient I/O is essential for practical usage of computing systems. It’s essential because if I/O operations are slow it almost doesn’t matter how fast the simulation is, because the simulation will be blocked on slow I/O operations, ultimately delaying the data needed for progressing the simulation and obtaining scientific insight.

Do you prefer working as an individual contributor or a team leader?

I always prefer to work in teams and strongly believe that the best scientific outcomes are the results of collaborative work. It’s true that an individual person may have a brilliant idea, but really, that idea is more likely to have broad and lasting impact if it incorporates the input of others. When leading teams, I strive to pull in all voices from the team because you never know when a left-field question will result in that “aha” moment that transforms the work. And I think we have a great opportunity now that the AI and HPC communities are converging, where we have people from traditionally disjointed technical communities starting to work together. It’s exciting to imagine the progress and innovations we’ll be making in the next few years, which will be even more impactful if we are intentional about cross-collaboration and communication of best practices between the communities.

Share with us an event you’ve been involved with that brought about an advance, a new insight, an innovation, a step forward in computer science or scientific research.

A recent example I am proud of is the UnifyFS project, a collaboration primarily between LLNL and Oak Ridge National Lab (ORNL) that I led as part of ECP. UnifyFS provides easy and fast access to node-local storage available on HPC systems, e.g., Frontier at ORNL. UnifyFS transparently intercepts application I/O calls and manages file operations on node-local storage, which is much faster than using the system-wide parallel file system, especially at large scales. Because the interception is transparent, users do not need to change their application code to be able to get performance benefits. Our team demonstrated significant improvements in I/O performance for scientific applications, e.g., UnifyFS improved Flash-X’s checkpoint I/O by more than 50x. The UnifyFS team received a 2024 R&D100 Award and the 2023 IPDPS Award for Open Source Software.

Who or what has influenced you the most to help you advance your career path in this advanced computing community?

I am lucky to have had amazing mentors throughout my academic and working career and that has really made all the difference for me. My PhD advisor and postdoc mentors were both outstanding role models and set me up for success. However, most of my mentoring relationships have been informal. I have found that many people are willing to help if you are open with them about the challenges you are facing. Many times, they will have faced something similar and are happy to give advice.

What are your thoughts on how we, the nation, build a stronger and deeper pipeline of talented and passionate HPC and AI professionals?

One way to build a stronger pipeline is to broaden the coverage of our outreach efforts to get more kids and young adults excited about STEM careers. To meet the challenges facing our nation and the world, we need all hands on deck and can’t afford to miss out on engaging people from every corner of the country. It’s convenient to focus outreach efforts on schools in close proximity to where STEM staff live and work (because then they don’t have to travel to participate in the outreach), but the kids and young adults in those regions are likely already aware that STEM careers are an option for them. Instead, we should put more emphasis on reaching children and young adults in regions with fewer STEM jobs and role models to raise awareness of potential STEM careers.

Another idea is to leverage the excitement around the convergence of AI and HPC to entice new people to the field. People are motivated towards careers with meaning, where their work can have a positive impact on the world. The combination of HPC and AI offers a unique opportunity to create the technological pathways that produce scientific insights faster than we ever have before. By discussing these possibilities more often and on larger platforms, we can inspire a wider audience and encourage more people to pursue careers in AI and HPC.

What does it take to be an effective leader in HPC and AI?

It takes a willingness to learn and adopt new technologies. For the HPC community, the addition of AI to HPC means we need to step back and reconsider our tightly held beliefs about how HPC systems should be run. We need to learn from our colleagues in the AI community and consider adopting ideas they have developed to support AI workloads. And of course, the reverse path is true, too. Leaders in HPC have deep expertise in getting top performance from massively parallel systems, and we can contribute our hard-earned best practices to the AI community as they are working to utilize these systems that we understand so well.

What is the biggest challenge you face in your current role?

This is a time of change for HPC and, while it’s a challenge, it’s also very exciting. Up until recently, my work in data management and I/O for HPC has largely focused on scientific simulations that have fairly regular I/O patterns. Over the past years, our challenge has been to develop approaches that support those regular patterns at ever-increasing scales. Now, with the introduction of AI, the I/O workloads are not only quite different than those of traditional HPC simulations, but they are also changing quickly as people experiment with new methods of combining HPC and AI into workflows. It’s an exciting time to be a researcher in HPC, as we work together to meet the challenge of adapting our approaches to this new computing paradigm.

What changes do you see for the HPC / AI community in the next 5-10 years, and how do you see your own skills evolving during this time frame?

We are on the cusp of transformative outcomes from the merging of HPC and AI, and stronger information exchange between the two camps will get us there so much faster.

HPC has been around for decades and the HPC community has the know-how for getting the maximum performance out of massively parallel systems and getting scientific results as quickly as possible. That said, the algorithms used in our current simulation applications have reached limitations, where to solve larger and finer-grained problems, we need ever-larger, power-hungry machines to compute the solutions. The good news is that incorporating AI alongside our simulations (e.g., as surrogate models that provide predictions of what the full simulation would compute) has the potential to greatly shortcut the time to scientific insight, using fewer computational resources than running a full HPC simulation. Of course, there is a lot of work to do to verify and mature this HPC+AI approach since most AI work is done in the commercial domain and not specifically to solve the problems of scientific computing.

In contrast, over the last few years the AI community has been starting to use massively parallel systems and are facing many scalability challenges that the HPC community has already solved. For example, rules of thumb that are obvious to someone trained in HPC, e.g., never funnel I/O for a parallel job through a single process, are being newly learned by experts in the cloud and AI fields. The AI community could reach their goals much faster by not trying to reinvent the wheel, and instead incorporating the knowledge of HPC experts into their algorithms and workflows.

Even though exchanging information between HPC and AI experts can clearly enable both camps to reach their performance objectives more quickly, there are challenges in creating communication channels across communities that have vastly different goals. A potential pathway to establishing communication could be to create intentional, relatively lightweight information exchange opportunities, e.g., workshops or tutorials at popular conferences for the AI/Cloud and HPC domains. Leaders from HPC and AI will hopefully recognize the benefits of increased information exchange and invest time and funds to send their staff to these exchange opportunities.

Do you believe science drives technology or technology drives science?

 

In my mind, science and technology are complementary drivers, moving us forward hand-in-hand. Technology is the application of scientific discoveries, so based on that, science drives technology. Of course, in reality, many modern scientific discoveries could not be made (or at least not made as rapidly) without the aid of technology. For example, the Internet has accelerated research over the last few decades because we don’t have to go to the library to photocopy articles from paper journals; and supercomputers have made scientific insight possible without time-consuming and possibly dangerous real-world experiments. And this is true for the convergence of HPC and AI as well. Both HPC and AI are technologies that emerged from scientific research, and their combination will revolutionize many areas of science and industry.

Would you like to share anything about your personal life?

I love a good hike!