By Loudon Blair, Ciena Corporation
Exponential growth of computing power can’t last forever. We will hit a physical barrier where wires need to be thinner than atoms. However, when it comes to this computing power in action powering high performance computing or artificial intelligence applications, it’s not lack of computing power we should be concerned with.
Powering AI and HPC requires smarter networks and high-performance connectivity. As HPC, AI and deep learning applications develop, so does demand for faster compute cycles, higher data transfer rates and ultra-reliable connectivity. Since HPC typically involves connecting to larger computing systems elsewhere in the world, the quality and bandwidth of these connections are crucial.
New AI and HPC applications mean more traffic on the network. But more important, as use cases become integrated into sectors like healthcare, manufacturing, and finance, they will require real-time data processing and connectivity. Many of these use cases will require real-time, high-speed data transfer.
Examples include genomic sequencing in the medical sector, which involves processing massive datasets from DNA sequences to identify genetic markers related to diseases or disorders. In the world of finance, AI and HPC are promising to deliver high-frequency trading, which analyzes market data and executes trades in milliseconds to capture price differences. Alongside high-bandwidth demands, long-distance data transfer (for international trading) with low latency is crucial and spell the difference between profit and loss.
This will require smarter networks that use software to identify areas of congestion and potential outages—and which can automatically respond, self-configure, self-optimize, and self-heal as needed. This next-level, coherent network intelligence is key for the massive datasets of AI and HPC applications.
Will Networks Keep up?
Network evolution is a constant, but these new HPC-AI use cases may move the needle faster. The AI surge is a key factor in the push towards 400 gigabits per second (400 Gb/s) connectivity, but this is just one step. Network operators are already targeting 800 Gb/s and even 1.6 Terabits per second (Tb/s) while enhancing network decision-making through smarter software and analytics.
The challenge isn’t just data center capabilities, but also the wider network infrastructure to transport information quickly and efficiently. To do this, it is important to recognize the different types of connections that make up a network.
Internal connections – Within data centers, ‘fabric’ networks interconnect servers and storage. These have recently evolved to accommodate AI-driven GPU interconnections, driving the adoption of ultra-high bandwidth optics. Inside data centers are short links (ranging from tens of meters to less than 2 kilometers) that interconnect servers and switches across a fabric. This is where GPU-based servers for AI are interconnected, creating demand for ultra-high bandwidth optical interconnects of 800G and 1.6T.
Data center-to-data center connectivity – Data Center Interconnect (DCI) networks connect data centers across varying geographical scopes, enabling them to function as a unified system. This model allows operators to distribute the workload more efficiently and optimize performance. DCI doesn’t just span cities and campuses—it can also link across countries and continents, typically using private networks.
Connecting DCs to users and applications – Finally, the network connects users to data centers, essentially the internet, facilitating the link between telco service providers and cloud or data center providers.
To do this, networks also need new software to make them more intelligent and adaptable—to predict potential problems, anticipate trends, and proactively and reactively respond to changes on the network.
Coherent Technology and the Propagation Impairment Challenge
Increasing network capacity is not simply a case of laying more fiber—it doesn’t scale, and the environmental and financial costs are too great. It is about getting more out of existing fiber. The way to do this is with programmable optical technology and intelligent software to finely tune network capacity and also analyze/gather network insights to help ‘instruct’ the network to adapt and adjust as needed. Advancements in coherent technology are making it possible to do this.
Inside the data center, tasks such as Large Language Model training create tremendous strain on networks. Even though the distances are short, propagation impairments are still a challenge because of ultra-high signal bandwidth.
Until now, non-coherent technology such as Intensity Modulated / Direct Detect (IM/DD) or PAM4 technologies have been used to transmit data within data center fabrics. While these modulation approaches have typically provided a lower cost and consumed less power than coherent technologies in the past, they will be increasingly challenged to meet the demands of growing bandwidth capacities in the future.
This means as AI and HPC applications grow, coherent technology will be needed inside the data center. In the same way that coherent technology solved the propagation impairment problem when using IM/DD in long-distance networks, it will help overcome the same issues inside the data center as data rates grow.
Zooming out across the wider network, long-distance data transport is constrained by the amount of bandwidth on a single wavelength. In the age of AI and HPC, we will need to carry even more data per wavelength.
New developments in coherent technology are solving the challenge of propagation impairments that would otherwise limit bandwidth growth. Crucially, this technology makes it possible to maintain high data capacity over both short and long distances.
It’s Feasible, But Is it Sustainable?
These upgrades can be delivered through pluggable optical solutions into the data center or smaller nodes, improving capacity and performance in the same footprint. Pluggables are designed for various use cases, such as connecting over short/long distances, enabling interoperability between particular suppliers and minimizing power consumption. This last consideration, in particular, is pertinent.
Along with whether it’s physically possible to add this capacity to the network, another key consideration is operational and environmental sustainability. It becomes a question of footprint, power efficiency, space and power per bit, along with OPEX. Over the past five decades increased chip capacity has been accompanied by lower costs. Networking needs to follow the same trend, and so far, it is doing so.
Again, this is where coherent technology evolution comes into play, not only delivering the capabilities required but doing it in an efficient, cost-effective and power-efficient manner. The right pluggable can deliver double the capacity while halving energy consumption (power per bit). This is achieved through implementing DSP algorithms, advanced Complementary Metal-Oxide-Semiconductor (CMOS) technology for enhanced integration, and electro-optic miniaturization using photonic integration.
Advancements in coherent technology have revolutionized data transport across the wider network and will soon be required to do the same within data centers. By leveraging programmable technology and intelligent software, network operators can optimize capacity, gather valuable insights and adapt to changing customer needs. As we continue to push the boundaries of computing power, the evolution of networks remains essential to support the transformative potential of HPC and AI.
Loudon Blair is Senior Director, Corporate Strategy & Development, Ciena Corporation.
Speak Your Mind