RPI Computer Scientist Wins NSF Award to Match Exascale Systems with Petascale Data Volumes

Rensselaer Polytechnic Institute faculty member and computer scientist George Slota has been granted a National Science Foundation Faculty Early Career Development (CAREER) award to work on the problem of enabling exascale-class system to handle gigantic, petascale-class volumes of data.

“How do we best understand and get insight from this kind of data? To do that, we have to map the data to the hardware, with consideration of the algorithm itself,” said Slota, who has been awarded a $490,000 CAREER grant. “Each aspect is fairly challenging because of the complexity of the data and the complexity of modern hardware.”

The next generations of supercomputers will be powerful enough to analyze interactions as interactions as those between the users of a social network or between the neurons in a human brain. But the match between a machine that works on the exascale (1018 operations per second) and a complex dataset on the petascale (1015 elements) presents multiple challenges and must be carefully orchestrated.

RPI cited the example of a social network as large as Twitter or Facebook posing a massively complex data problem. “In the world of data, each user can be described as a point or ‘node,’ and each interaction between users is a line or ‘edge’ between the nodes. There are nearly limitless attributes for a node – geographical location, age, favorite tree – and the number of edges that connect users are apt to be highly irregular given that some people have millions of connections and others just a handful. The collection of nodes and edges, called ‘graph structured data,’ presents two problems to computer scientists in that the data set can be both massive and irregular.”

George Slota, RPI

“There isn’t any inherent kind of pattern or structure to a lot of these networks,” Slota said. “So it becomes computationally challenging to work with these data sets because you can’t make any assumptions about what’s going on under the hood without actually first studying what’s going on under the hood.”

With the grant, Slota plans to develop a “graph layout,” a high-quality and scalable means of partitioning, ordering and storing the data given the data type, the relevant algorithms and the hardware platform that will be used to analyze it. Once the data is loaded in a way that makes sense, the second challenge is processing it. In many ways, these are similar problems.

Slota will work to map the data and the algorithmic analysis method to the equally complex and irregular architecture of the supercomputer, with its network of interconnected computers, processors and multiple levels of memory.

And then, Slota said, “we want to bring it home,” by which he means producing scalable open-source software, software frameworks and toolkits that will enable the broader scientific community to easily address these challenges as related to their specific dataset, analytical problem and hardware.

“The ideal solution is a general purpose way that, given any arbitrary dataset, we can load it, process it, and solve some arbitrary problem that’s relevant to a biologist who’s studying the brain, or a physicist studying particle interactions, or whatever the application may be,” Slota said.

“A key challenge facing data-driven scientific advancement is the ability to organize and extract meaning from massive datasets associated with many fields of present-day research,” said Curt Breneman, dean of the RPI School of Science. “George’s work points the way towards new solutions to this critical data analysis bottleneck.  This CAREER award recognizes his potential, as well as the importance of this research, and we congratulate him on this recognition.”