Xilinx Steps Up with Alveo FPGA boards and Versal Adaptive Compute Acceleration Platform

Today FPGA maker Xilinx unveiled Versal, “the industry’s first adaptive compute acceleration platform (ACAP)“. The company also announced new Alveo FPGA cards, which the company claims can deliver “4X the performance of GPUs, 90X the performance of CPUs, plus unprecedented adaptability across workloads.

With the explosion of AI and big data and the decline of Moore’s Law, the industry has reached a critical inflection point. Silicon design cycles can no longer keep up with the pace of innovation,” says Xilinx CEO Victor Peng. “Four years in development, Versal is the industry’s first ACAP. We uniquely designed it to enable all types of developers to accelerate their whole application with optimized hardware and software and to instantly adapt both to keep pace with rapidly evolving technology. It is exactly what the industry needs at the exact moment it needs it.”

Versal ACAPs combine Scalar Processing Engines, Adaptable Hardware Engines, and Intelligent Engines with leading-edge memory and interfacing technologies to deliver powerful heterogeneous acceleration for any application. But most importantly, the Versal ACAP’s hardware and software can be programmed and optimized by software developers, data scientists, and hardware developers alike, enabled by a host of tools, software, libraries, IP, middleware, and frameworks that enable industry-standard design flows.

Built on TSMC’s 7-nanometer FinFET process technology, the Versal portfolio is the first platform to combine software programmability with domain-specific hardware acceleration and the adaptability necessary to keep pace with today’s rapid pace of innovation. The portfolio includes six series of devices uniquely architected to deliver scalability and AI inference capabilities for a host of applications across different markets, from cloud to networking to wireless communications to edge computing and endpoints.

Executives from AMD, Arm, Twitch and Nokia also made announcements at the Forum about their company’s use of Xilinx FPGAs and SoCs. AMD, one of the Xilinx partners that is showcasing products based on the new Alveo boards, announced a server that will set a new world record for real-time AI inference processing, with a mind-boggling 30,000-images-per-second inference throughput.

The portfolio includes the Versal Prime series, Premium series and HBM series, which are designed to deliver industry-leading performance, connectivity, bandwidth, and integration for the most demanding applications. It also includes the AI Core series, AI Edge series, and AI RF series, which feature the breakthrough AI Engine. The AI Engine is a new hardware block designed to address the emerging need for low-latency AI inference for a wide variety of applications and also supports advanced DSP implementations for applications like wireless and radar. It is tightly coupled with the Versal Adaptable Hardware Engines to enable whole application acceleration, meaning that both the hardware and software can be tuned to ensure maximum performance and efficiency.

The portfolio debuts with the Versal Prime series, delivering broad applicability across multiple markets, and the Versal AI Core series, delivering an estimated 8X AI inference performance boost versus industry-leading GPUs:

  • The Versal AI Core series delivers the portfolio’s highest compute and lowest latency, enabling breakthrough AI inference throughput and performance. The series is optimized for cloud, networking, and autonomous technology, offering the highest range of AI and workload acceleration available in the industry. The Versal AI Core series has five devices, offering 128 to 400 AI Engines. The series includes dual-core Arm Cortex™-A72 application processors, dual-core Arm Cortex-R5 real-time processors, 256KB of on-chip memory with ECC, more than 1,900 DSP engines optimized for high-precision floating point with low latency. It also incorporates more than 1.9 million system logic cells combined with more than 130Mb of UltraRAM, up to 34Mb of block RAM, and 28Mb of distributed RAM and 32Mb of new Accelerator RAM blocks, which can be directly accessed from any engine and is unique to the Versal AI series’ – all to support custom memory hierarchies. The series also includes PCIe Gen4 8-lane and 16-lane, and CCIX host interfaces, power-optimized 32G SerDes, up to 4 integrated DDR4 memory controllers, up to 4 multi-rate Ethernet MACs, 650 high-performance I/Os for MIPI D-PHY, NAND, storage-class memory interfacing and LVDS, plus 78 multiplexed I/Os to connect external components and more than 40 HD I/Os for 3.3V interfacing. All of this is interconnected by a state-of-the-art network-on-chip (NoC) with up to 28 master/slave ports, delivering multi-terabit per-second bandwidth at low latency combined with power efficiency and native software programmability. The full product table is now available.
  • The Versal Prime series is designed for broad applicability across multiple markets and is optimized for connectivity and in-line acceleration of a diverse set of workloads. This mid-range series is made up of nine devices, each including dual-core Arm® Cortex-A72 application processors, dual-core Arm Cortex-R5 real-time processors, 256KB of on-chip memory with ECC, more than 4,000 DSP engines optimized for high-precision floating point with low latency. It also incorporates more than 2 million system logic cells combined with more than 200Mb of UltraRAM, greater than 90Mb of block RAM, and 30Mb of distributed RAM to support custom memory hierarchies. The series also includes PCIe® Gen4 8-lane and 16-lane, and CCIX host interfaces, power-optimized 32 gigabits-per-second SerDes and mainstream 58 gigabits-per-second PAM4 SerDes, up to 6 integrated DDR4 memory controllers, up to 4 multi-rate Ethernet MACs, 700 high-performance I/Os for MIPI D-PHY, NAND, and storage-class memory interfaces and LVDS, plus 78 multiplexed I/Os to connect external components, and greater than 40 HD I/O for 3.3V interfacing. All of this is interconnected by a state-of-the-art network-on-chip (NoC) with up to 28 master/slave ports, delivering multi-terabits per-second bandwidth at low latency combined with power efficiency and native software programmability. The full product table is available now.
  • Development Software Portfolio. The Versal portfolio is enabled by a development environment with a comprehensive software stack including drivers, middleware, libraries and software framework support. More details on the software programming tools will be made available next year.

The Versal Prime series and Versal AI Core series will be generally available in the second half of 2019.

Sign up for our insideHPC Newsletter