NVIDIA Reports Blackwell Surpasses 1000 TPS/User Barrier with Meta’s Llama 4 Maverick

NVIDIA said it has achieved a record large language model (LLM) inference speed, announcing that an NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs achieved more than 1,000 tokens per second (TPS) per user on the 400-billion-parameter Llama 4 Maverick model. NVIDIA said the model is the largest and most powerful in the Llama 4 […]

NeuReality Announces Inference Appliance Is Preloaded with AI Models

Caesarea, Israel – May 14, 2025 – NeuRealityannounced that its NR1 Inference Appliance now comes preloaded with enterprise AI models, including Llama, Mistral, Qwen, Granite1, plus support for private generative AI clouds and on premise clusters. The company said the appliance is up and running in under 30 minutes and “delivers 3x better time-to-value, allowing customers […]

AI Inference: Meta Collaborates with Cerebras on Llama API

Sunnyvale, CA — Meta has teamed with Cerebras on AI inference in Meta’s new Llama API, combining  Meta’s open-source Llama models with inference technology from Cerebras. Developers building on the Llama 4 Cerebras model in the API can expect speeds up to 18 times faster than traditional GPU-based solutions, according to Cerebras. “This acceleration unlocks […]

GigaIO and d-Matrix to Build Inference Platform for Enterprise AI

CARLSBAD, Calif.– Edge-to-core AI platform company GigaIO today announced the next phase of its partnership with d-Matrix to deliver an inference solution for enterprises deploying AI at scale. Integrating d-Matrix’s Corsair inference platform into GigaIO’s SuperNODE architecture creates a solution designed to eliminate “the complexity and performance bottlenecks traditionally associated with large-scale AI inference deployment.” […]

NeuReality Launches Developer Portal for NR1 AI Inference Platform 

SAN JOSE — April 16, 2024 — NeuReality, an AI infrastructure technology company, announced today the release of a software developer portal and demo for installation of its software stack and APIs. The company said the announcement marks a milestone since delivery of its 7nm AI inference server-on-a-chip, the NR1 NAPU, and bring up of […]

In-Memory Computing Could Be an AI Inference Breakthrough

[CONTRIBUTED THOUGHT PIECE] In-memory computing promises to revolutionize AI inference. Given the rapid adoption of generative AI, it makes sense to pursue a new approach to reduce cost and power consumption by bringing compute in memory and improving performance.

Industry Heavyweights Form Ultra Ethernet Consortium for HPC and AI

SAN FRANCISCO – July 19, 2023 – A host of industry heavyweights have formed the Ultra Ethernet Consortium (UEC), intended to promote “industry-wide cooperation to build a complete Ethernet-based communication stack architecture for high-performance networking” for HPC and AI workloads, the new group said. Founding members include AMD, Arista, Broadcom, Cisco, Eviden (an Atos Business), […]

MLCommons: Latest MLPerf AI Benchmark Results Show Machine Learning Inference Advances

SAN FRANCISCO – September 8, 2022 – Today, the open engineering consortium MLCommons announced results from MLPerf Inference v2.1, which analyzes the performance of inference — the application of a trained machine learning model to new data. Inference allows for the intelligent enhancement of a vast array of applications and systems. Here are the results and […]

MLCommons Launches MLPerf Tiny AI Inference Benchmark

Today, open engineering consortium MLCommons released a new benchmark, MLPerf Tiny Inference to measure trained neural network AI inference performance for low-power devices in small form factors. MLPerf Tiny v0.5 is MLCommons’s first inference benchmark suite for embedded device machine learning, a growing field in which AI-driven sensor data analytics is performed in real-time, close […]

LeapMind Unveils Efficiera Ultra Low-Power AI Inference Accelerator IP

Today LeapMind announced Efficiera, an ultra-low power AI inference accelerator IP for companies that design ASIC and FPGA circuits, and other related products. Efficiera will enable customers to develop cost-effective, low power edge devices and accelerate go-to-market of custom devices featuring AI capabilities. “This product enables the inclusion of deep learning capabilities in various edge devices that are technologically limited by power consumption and cost, such as consumer appliances (household electrical goods), industrial machinery (construction equipment), surveillance cameras, and broadcasting equipment as well as miniature machinery and robots with limited heat dissipation capabilities.”