Sign up for our newsletter and get the latest HPC news and analysis.
Send me information from insideHPC:

Overcoming Roadblocks in Computational Networks

Mariam Kiran from ESnet

Researchers at ESnet are working on new algorithms to identify network problems and find solutions on the fly so information moves quickly and on time.

Like other complex systems, computer networks can break down and suffer bottlenecks. Keeping such systems running requires algorithms that can identify problems and find solutions on the fly so information moves quickly and on time.

Mariam Kiran – a network engineer for the Energy Sciences Network (ESnet), a DOE Office of Science user facility managed by Lawrence Berkeley National Laboratory – is using an early-career research award from DOE’s Office of Science to develop methods combining machine-learning algorithms with parallel computing to optimize such networks.

This type of science and the problems it can address can make a real impact, Kiran says. “That’s what excites me about research – that we can improve or provide solutions to real-world problems.”

Kiran’s interest in science and mathematics was fueled by Doctor Who and other popular television shows she watched in her native United Kingdom. At 15 she got her first taste of computer programming, through a school project, using the BASIC programming language to create an airline database system. “I added a lot of graphics so that if you entered a wrong password, two airplanes would come across (the screen) and crash,” she says. It felt great to use a computer to create something out of nothing.

Kiran’s economist father and botanist mother encouraged her interests and before long she was studying computer science at the University of Sheffield. Pop culture also influenced her interests there, at a time when many students dressed in long black coats like those seen in the blockbuster movie The Matrix. The core computer science concept from that film – using computer simulations to test complex theories – was appealing.

She started coding such simulations, but along the way discovered another interest: developing ways around computer science roadblocks in those experiments. With simulations “you have potentially too much data to be processed, so you need a very fast and good system on the back end to make sure that the simulation goes as fast as it can,” she says. That challenge got her interested in computing and network infrastructure such as high-performance computing systems and cloud computing. She wanted to understand the problems and find strategies that help software run correctly and smoothly.

Kiran’s interest led her to join the software engineering and testing group at the University of Sheffield, where she also completed her master’s degree and Ph.D. She was part of a team that assembled a simulation platform for coding interacting components of a complex system – or agent-based modeling, used widely in Europe to calculate problems in economics or biology. Each agent could represent a government, a person, an individual organism, or a cell. “You code everything up as an agent and then let them interact with other agents, randomly or by following certain rules, and see how the system reacts overall.”

In 2014, she joined the UK’s University of Bradford as an associate professor and taught software engineering and machine learning. However, her research interests in performance optimization of computing and networks led her to investigate new projects that examined similar problems in applications that run over distributed compute and network resources. As a result, in 2016 she joined ESnet, which supports international science research computing networks and has produced a variety of innovations such as TCP and high-speed connections.

With her early career grant, Kiran has five years of support to pursue software innovations that can manage the efficiency of today’s computer networks and take them to the next level. Machine learning algorithms – such as deep neural networks used for image recognition and analysis – can be exploited to understand user behavior and data-movement patterns across the network. A computer networks is a complex distributed system. How one heals itself or performs corrective measures at the edge while operating optimally overall is an interesting challenge to understand and solve, Kiran says.

The network changes depending on the users and applications interacting on it.”

Managing information across networks is like transporting cargo on a highway system, she says. “You’re moving data from one building to the next building, and you have to find the shortest possible route.” The fastest path might depend on the time of day and traffic patterns.

Some science applications, however, are deadline-driven and require data to arrive by certain times to succeed. Short routes might become overly congested, whereas slightly longer paths may be under-used.

In the end, it’s a dynamic, multi-objective problem – finding the best possible route for data, one that is fast and less congested.

Throughout the day, the state of the network changes depending on the users and applications interacting on it,” Kiran notes. “Understanding these complex relationships is a challenge. I’m interested in seeing whether machine learning can help us understand these more and allow networks to automate corrective measures in near-real time to prevent outages and application failures.”

She’s now identifying main problems along autonomous networks and applying those lessons to analogous computational and network problems. For example, she’s examining how engineers deal with outage-triggering bottlenecks and how bandwidth is controlled across links. Being at ESnet, which has led networking research for years, provides immense experience and capabilities to learn and apply solutions to a high-speed network that is built to think, she says.

Better-functioning networks could speed computational research on a range of topics, including climate, weather and nuclear energy. High performance computing boosts these calculations by rapidly distributing them across multiple computers and processors, sometimes across the world. It also allows international scientists to collaborate quickly. Researchers at diverse locations from Berkeley Lab to Switzerland’s CERN to labs in South America can interact with data quickly and seamlessly and develop new theories and findings.

Source: Department of Energy

Sign up for our insideHPC Newsletter

Leave a Comment


Resource Links: