PNNL’S CENATE Taps ML to Guard DOE Supercomputers Against Illegitimate Workloads

Pacific Northwest National Lab sent along this article today by PNNL’s Allan Brettman, who writes about the advanced techniques used by the lab’s Center for Advanced Technology Evaluation (CENATE) “to judge HPC workload legitimacy that is as stealthy as an undercover detective surveying the scene through a two-way mirror.” This includes machine learning methods, such as recurrent neural networks, that the center has found delivers prediction accuracy of more than 95 percent.

By Allan Brettman

By all appearances, Ang Li and Kevin Barker are computer scientists. But looks can be deceiving.

The Pacific Northwest National Laboratory (PNNL) duo are also high-tech sleuths, training powerful computers to perform gumshoe work protecting the nation from cybersecurity threats.

Li, Barker, PNNL colleagues, and university collaborators have developed a system to ferret out questionable use of high-performance computing (HPC) systems within the U.S. Department of Energy (DOE). As HPC systems become more powerful—arguably the most sophisticated and largest systems in the world—they are under potential threat by attackers seeking to run malicious software.

Tracking down nefarious users is just one example of work at PNNL’s Center for Advanced Technology Evaluation (CENATE), a computing proving ground supported by DOE’s Office of Science. Broadly, CENATE aims to understand the impact of advanced computing technologies on scientific workloads. In this role, Barker, CENATE co-principal investigator David Manz, and colleagues have developed a nonintrusive profiling framework to judge HPC workload legitimacy that is as stealthy as an undercover detective surveying the scene through a two-way mirror.

Removing Intruders

CENATE has led development of machine learning methods such as recurrent neural networks (RNNs) to classify the distinctive signatures of authorized and unauthorized workloads. With a prediction accuracy of more than 95 percent, this open-source framework can assist system administrators in identifying and removing unauthorized workloads and intruders, assuring system availability and integrity for legitimate scientific users.

PNNL computer scientist Ang Li

“Machine learning methods are helping us to identify some of the key characteristics of workloads that represent legitimate scientific computing or something that might be anomalous that we would want to look into further,” said Barker, a PNNL computer scientist and CENATE principal investigator. “The machine learning algorithm can learn these patterns by looking at legitimate scientific codes.

“The machine learning algorithm can learn some of these patterns and then learn to distinguish between legitimate codes—something we expect to see—versus something that looks strange to us, something we’ll want to flag for a human operator to look into more deeply.”

Detecting Cyber-fingerprints

As you might suspect, there is not a gigantic data set that can be loaded into a supercomputer named “Catch the Bad Cyber Guys.”

In lieu of available information based on actual nefarious activities, researchers created a data set that reflected known, unallowable characteristics, said Li, a PNNL computer scientist. “We identified codes that illegitimate users might be running,” said Li.

Li and colleagues tapped into publicly available data from sources such as GitHub, GitLab, and Bitbucket to create their own smaller data set, to identify fingerprints of cybersecurity skullduggery such as cryptocurrency applications, password cracking activity, or longer-than-customary computer runtimes.

Trust, but Verify

Malicious codes slink into the worlds of cryptocurrency mining and password cracking, and RNNs are on the lookout for suspect behavior. How much data is being moved from a central processing unit (CPU) to a graphics processing unit (GPU)? What is the power consumption of the GPU memory? Eventually, CENATE computer scientists expect to have a greatly enhanced RNN that can expand the potential for finding anomalous clues in the underworld of malicious codes.

Li and Barker, along with colleagues Pengfei Zou and Rong Ge of Clemson University, published a paper in 2019 for the IEEE International Workshop/Symposium on Workload Characterization that described how machine learning through real-time RNNs could detect illicit use of high-performance computers.

Misusing a supercomputing capability presents several problems, they reported. It not only deprives mission-critical and scientific applications of execution cycles, but also increases the chance for attackers to steal data, damage systems, and leverage the high computation and network bandwidth for attacking other sites.

Prospective users of DOE supercomputing resources undergo a high level of scrutiny before they’re granted access to some of the world’s most sophisticated equipment. That entrée comes with a level of trust, said Barker, which presents a challenge for detecting unauthorized computing.

“Once a user is approved, we kind of trust them to know what they’re doing,” said Barker. “So, having these kinds of automated, machine learning tools can help facility operators with filling that trust gap—to know in real time if a user is not doing what they’re supposed to.”