NVIDIA Reports Blackwell Surpasses 1000 TPS/User Barrier with Meta’s Llama 4 Maverick

NVIDIA said it has achieved a record large language model (LLM) inference speed, announcing that an NVIDIA DGX B200 node with eight NVIDIA Blackwell GPUs achieved more than 1,000 tokens per second (TPS) per user on the 400-billion-parameter Llama 4 Maverick model. NVIDIA said the model is the largest and most powerful in the Llama 4 […]

NeuReality Announces Inference Appliance Is Preloaded with AI Models

Caesarea, Israel – May 14, 2025 – NeuRealityannounced that its NR1 Inference Appliance now comes preloaded with enterprise AI models, including Llama, Mistral, Qwen, Granite1, plus support for private generative AI clouds and on premise clusters. The company said the appliance is up and running in under 30 minutes and “delivers 3x better time-to-value, allowing customers […]

AI Inference: Meta Teams with Cerebras on Llama API

Meta has teamed with Cerebras on AI inference in Meta’s new Llama API, combining  Meta’s open-source Llama models with inference technology from Cerebras. Developers building on the Llama 4 Cerebras model in the API can expect speeds up to 18 times faster than traditional GPU-based solutions ….

AI Inference: Meta Collaborates with Cerebras on Llama API

Sunnyvale, CA — Meta has teamed with Cerebras on AI inference in Meta’s new Llama API, combining  Meta’s open-source Llama models with inference technology from Cerebras. Developers building on the Llama 4 Cerebras model in the API can expect speeds up to 18 times faster than traditional GPU-based solutions, according to Cerebras. “This acceleration unlocks […]

Dell Unveils Xeon 6 Servers, New Storage Appliances and Software for HPC-AI

Dell Technologies (NYSE: DELL) today made announcements across its data center server, storage and data protection portfolios. Here’s a rundown of the new offerings: Dell PowerEdge R470, R570, R670 and R770 ….

MLCommons Releases MLPerf Inference v5.0 Benchmark Results

Today, MLCommons announced new results for its MLPerf Inference v5.0 benchmark suite, which delivers machine learning (ML) system performance benchmarking. The rorganization said the esults highlight that the AI community is focusing on generative AI ….

Vultr Announces Early Availability of NVIDIA HGX B200

WEST PALM BEACH, Fla. — Vultr, a privately-held cloud infrastructure company, today announced it offers early access to the NVIDIA HGX B200. Vultr Cloud GPU, accelerated by NVIDIA HGX B200, will provide training and inference support for enterprises looking to scale AI-native applications via Vultr’s 32 cloud data center regions worldwide. “We are pleased to […]

Axelera AI Wins EuroHPC Grant of up to €61.6M for AI Chiplet Development

AI hardware maker Axelera AI has unveiled Titania, which the company described as a high-performance, low-power and scalable AI inference chiplet. Part of the EuroHPC JU’s effort to develop a supercomputing ….

Blaize Received Approval to List its Common Stock and Warrants on Nasdaq

WASHINGTON & EL DORADO HILLS, Calif., Jan 13, 2024 – Blaize, Inc., a provider of artificial intelligence-enabled edge computing solutions, and acquisition company BurTech today announced that they expect to complete their previously announced business combination on January 12, 2025. The combined company will be named “Blaize Holdings, Inc.” and its common stock and warrants […]

Lambda Launches Inference API

Dec. 13, 2024 — AI company Lambda today announced its Inference API, which the company said enables access to LLMs through a serverless AI for “a fraction of a cent.” The company said Lambda Inference API offers low-cost, scalable AI inference with such models as Meta’s recently released Llama 3.3 70B Instruct (FP8) at $0.20 […]