Today Xilinx announced the expansion of its 16 nanometer (nm) Virtex UltraScale+ family to now include the world’s largest FPGA — the Virtex UltraScale+ VU19P. With 35 billion transistors, the VU19P provides the highest logic density and I/O count on a single device ever built, enabling emulation and prototyping of tomorrow’s most advanced ASIC and SoC technologies, as well as test, measurement, compute, networking, aerospace and defense-related applications.
The VU19P sets a new standard in FPGAs, featuring 9 million system logic cells, up to 1.5 terabits per-second of DDR4 memory bandwidth and up to 4.5 terabits per-second of transceiver bandwidth, and over 2,000 user I/Os. It enables the prototyping and emulation of today’s most complex SoCs as well as the development of emerging, complex algorithms such as those used for artificial intelligence, machine learning, video processing and sensor fusion. The VU19P is 1.6X larger than its predecessor and what was previously the industry’s largest FPGA — the 20 nm Virtex UltraScale 440 FPGA.
The VU19P enables developers to accelerate hardware validation and begin software integration before their ASIC or SoC is available,” said Sumit Shah, senior director, product line marketing and management, Xilinx. “This is our third generation of world-record FPGAs. First was the Virtex-7 2000T, followed by the Virtex UltraScale VU440, and now the Virtex UltraScale+ VU19P. But this is more than silicon technology; we’re providing robust and proven tool flows and IP to support it.”
Features include:
- Highest Logic Capacity. 9 million system logic cells allow designers to emulate and prototype larger-scale, more complex designs, and create customized test logic for test-equipment vendors.
- I/O Capacity and Bandwidth. Massive I/O bandwidth is not only ideal for multi-FPGA interconnect but also allows engineers to connect a broad range of external memory types and rates to implement fast, deep storage of state information.
- High-Speed Transceivers. 80 GTY (28Gb/s) transceivers offer up to 4.5Tb/s transceiver bandwidth, which is suited for high port density test equipment and next-generation platforms using emerging interface standards and protocols.
- Superior Cooling. Lidless packaging provides an optimal cooling solution that allows designers to push the limits of performance to the extreme. Deploying high-performance systems in the thermally-constrained environment is now easier than ever.
The VU19P is supported by an extensive set of debug, visibility tools, and IP, providing customers with a comprehensive development platform to quickly design and validate next-generation applications and technologies. Hardware and software co-validation allows for developers to bring up software and implement custom features before physical parts are available. Moreover, the design flow can be co-optimized by using the Xilinx Vivado Design Suite, which reduces cost and tape-out risk, and improves efficiency and time-to-market.
Arm relies on Xilinx devices as part of our process for validating our next-generation processor IP and SoC technology,” said Tran Nguyen, director of design services, Arm. “The new VU19P will further enable Arm, and many others in our ecosystem, to accelerate the design, development and validation of our most ambitious roadmap technologies.”
The VU19P will be generally available in the fall of 2020.
The operations for the fast Walsh Hadamard transform can both be done in parallel and pipelined to the point where it ceases to be a rate limiting step. Implementing parameterized nonlinear functions that involve multiply operations would take longer but still could be done faster than the bandwidth of the external memory system. You can put those together and have a neural network whose speed is only rate limited by memory bandwidth:
https://github.com/S6Regen/Fixed-Filter-Bank-Neural-Networks
That would be very useful for applications like video compression and molecular dynamics simulations where very rapid evaluations are needed.