[CONTRIBUTED THOUGHT PIECE] In-memory computing promises to revolutionize AI inference. Given the rapid adoption of generative AI, it makes sense to pursue a new approach to reduce cost and power consumption by bringing compute in memory and improving performance.
Industry Heavyweights Form Ultra Ethernet Consortium for HPC and AI
SAN FRANCISCO – July 19, 2023 – A host of industry heavyweights have formed the Ultra Ethernet Consortium (UEC), intended to promote “industry-wide cooperation to build a complete Ethernet-based communication stack architecture for high-performance networking” for HPC and AI workloads, the new group said. Founding members include AMD, Arista, Broadcom, Cisco, Eviden (an Atos Business), […]
NVIDIA Launches Inference Platforms for Large Language Models and Generative AI Workloads
NVIDIA launched four inference platforms optimized for a diverse set of rapidly emerging generative AI applications — helping developers quickly build specialized, AI-powered applications that can deliver new services and insights. The platforms combine NVIDIA’s full stack of inference software with the latest NVIDIA Ada, NVIDIA Hopper™ and NVIDIA Grace Hopper™ processors — including the NVIDIA L4 Tensor Core GPU and the NVIDIA H100 NVL GPU, both launched at GTC.
MLCommons: Latest MLPerf AI Benchmark Results Show Machine Learning Inference Advances
SAN FRANCISCO – September 8, 2022 – Today, the open engineering consortium MLCommons announced results from MLPerf Inference v2.1, which analyzes the performance of inference — the application of a trained machine learning model to new data. Inference allows for the intelligent enhancement of a vast array of applications and systems. Here are the results and […]
Intel’s Habana Labs Launches Second-Generation AI Processors for Training and Inferencing
Intel announced that Habana Labs, its data center team focused on AI deep learning processor technologies, launched its second-generation deep learning processors for training and inference: Habana® Gaudi®2 and Habana® Greco™. These new processors address an industry gap by providing customers with high-performance, high-efficiency deep learning compute choices for both training workloads and inference deployments in the data center while lowering the AI barrier to entry for companies of all sizes.
TensorRT 8 Provides Leading Enterprises Fast AI Inference Performance
NVIDIA today launched TensorRT™ 8, the eighth generation of the company’s AI software, which slashes inference time in half for language queries — enabling developers to build the world’s best-performing search engines, ad recommendations and chatbots and offer them from the cloud to the edge.
MLCommons Launches MLPerf Tiny AI Inference Benchmark
Today, open engineering consortium MLCommons released a new benchmark, MLPerf Tiny Inference to measure trained neural network AI inference performance for low-power devices in small form factors. MLPerf Tiny v0.5 is MLCommons’s first inference benchmark suite for embedded device machine learning, a growing field in which AI-driven sensor data analytics is performed in real-time, close […]
LeapMind Unveils Efficiera Ultra Low-Power AI Inference Accelerator IP
Today LeapMind announced Efficiera, an ultra-low power AI inference accelerator IP for companies that design ASIC and FPGA circuits, and other related products. Efficiera will enable customers to develop cost-effective, low power edge devices and accelerate go-to-market of custom devices featuring AI capabilities. “This product enables the inclusion of deep learning capabilities in various edge devices that are technologically limited by power consumption and cost, such as consumer appliances (household electrical goods), industrial machinery (construction equipment), surveillance cameras, and broadcasting equipment as well as miniature machinery and robots with limited heat dissipation capabilities.”
Gyrfalcon Acceleration Chips Speed SolidRun AI Inference Server
Today SolidRun introduced a new Arm-based AI inference server optimized for the edge. Highly scalable and modular, the Janux GS31 supports today’s leading neural network frameworks and can be configured with up to 128 Gyrfalcon Lightspeeur SPR2803 AI acceleration chips for unrivaled inference performance for today’s most complex video AI models. “While GPU-based inference servers have seen significant traction for cloud-based applications, there is a growing need for edge-optimized solutions that offer powerful AI inference with less latency than cloud-based solutions. Working with Gyrfalcon and utilizing their industry-proven ASICs has allowed us to create a powerful, cost-effective solution for deploying AI at the Edge that offers seamless scalability.”
NVIDIA Tops MLPerf AI Inference Benchmarks
Today NVIDIA posted the fastest results on new benchmarks measuring the performance of AI inference workloads in data centers and at the edge — building on the company’s equally strong position in recent benchmarks measuring AI training. “NVIDIA topped all five benchmarks for both data center-focused scenarios (server and offline), with Turing GPUs providing the highest performance per processor among commercially available entries.”









