AI inference Archives

FriendliAI Partners with NVIDIA on Nemotron 3 for Agentic AI Inference

December 19, 2025 by staff

Redwood City, CA – FriendliAI, an AI inference platform company, announced a partnership with NVIDIA to launch the Nemotron 3 model family, available on FriendliAI’s Dedicated Endpoints. Developers can deploy Nemotron 3 models on FriendliAI’s inference platform. Highlights include: Up to 13× faster token generation via hybrid Mamba-Transformer MoE architecture and multi-token prediction (MTP) technique MoE routing […]

Filed Under: AI News, HPC-AI Hardware, HPC-AI Software, News Tagged With: AI, AI inference, FriendliAI, inference, nvidia

Red Hat Expands Inference Collaboration with AWS AI chips

December 4, 2025 by staff

Raleigh, N.C. – Red Hat today announced an expanded collaboration with Amazon Web Services (AWS) for generative AI (gen AI) on AWS with Red Hat AI and AWS AI silicon. With this collaboration, Red Hat focuses on empowering IT decision-makers with the flexibility to run high-performance, efficient AI inference at scale, regardless of the underlying […]

Filed Under: AI News, Collaboration, HPC-AI Software, News Tagged With: AI, AI inference, AWS, Red Hat

Deploying Inference Services Webinar at ALCF Dec. 3

November 24, 2025 by staff

Nov. 24, 2025 — The Argonne Leadership Computing Facility will hold a webinar showcasing the ALCF’s Inference Service on Wednesday, Dec. 3, 2025. Registration information can be found here. The facility’s inference services provides cloud-like access to diverse AI models—including large language models (LLMs)—on existing HPC clusters. ALCF’s Benoit Côté will demonstrate how to integrate the […]

Filed Under: AI for Science, AI News, Government, HPC-AI Hardware, HPC-AI Software, National Lab News, News Tagged With: AI inference, ALCF, Argonne Leadership Computing Facility, Argonne National Laboratory, HPC'

d-Matrix and Andes Collaborate on RISC-V Accelerator for AI Inference

November 17, 2025 by staff

ST. LOUIS (SC25) — Nov 17, 2025 – Generative AI inference compute company d-Matrix and Andes Technology , a supplier of RISC-V processor cores, announced that d-Matrix has selected the AndesCore AX46MPV for its next-generation Raptor inference architecture. The companies said the collaboration represents a convergence of memory-centric computing and open-standard processor innovation for AI workloads […]

Filed Under: AI News, HPC-AI Hardware, News Tagged With: AI inference, Andes Technology, d-Matrix, inference, RISC-V

Qualcomm Unveils Rack-Scale AI Inference Chips

October 27, 2025 by staff

Qualcomm Technologies today announced the launch of its AI inference solutions for data centers: the Qualcomm AI200 and AI250 chip-based accelerator cards, and racks. The AI200 and AI250 are expected to be commercially available in 2026 and 2027 ….

Filed Under: AI News, Compute, HPC-AI Hardware, Machine Learning, News Tagged With: AI, AI compute, AI inference, artificial intelligence, Qualcomm

IBM and Groq Partner on Enterprise AI Deployment on watsonx Orchestrate

October 21, 2025 by staff

ARMONK, N.Y. and MOUNTAIN VIEW, Calif., Oct. 20, 2025 — IBM and Groq today announced a partnership to provide access to Groq’s AI inference technology, GroqCloud, on watsonx Orchestrate. The partnership is intended to provide high-speed AI inference capabilities at a cost that helps accelerate agentic AI deployment. As part of the partnership, Groq and IBM plan to integrate […]

Filed Under: AI News, HPC-AI Hardware, Machine Learning, News Tagged With: AI, AI inference, artificial intelligence, Groq, IBM, IBM watsonx

Red Hat AI 3 Announced for Distributed AI Inference

October 14, 2025 by staff

RALEIGH, N.C. – Oct. 14, 2025 – Red Hat today announced Red Hat AI 3 as part of its enterprise AI platform. Bringing together the latest developments of Red Hat AI Inference Server, Red Hat Enterprise Linux AI (RHEL AI) and Red Hat OpenShift AI, the platform is deisnged to simplify the complexities of AI […]

Filed Under: AI News, Machine Learning, News Tagged With: AI, AI inference, artificial intelligence, Red Hat

Cerebras Reports 3,000 Tokens Per Second Inference on OpenAI gpt-oss-120b Model

August 6, 2025 by staff

Cerebras Systems today announced inference support for gpt-oss-120B, OpenAI’s first open-weight reasoning model, running at record inference speeds of 3,000 tokens per second on the Cerebras AI Inference Cloud, according to ….

Filed Under: Advanced Tech & Efficiency, AI News, Compute, Machine Learning, News Tagged With: AI, AI inference, artificial intelligence, Cerebras, Cerebras Cloud, gpt-oss-120b, OpenAI, Wafer Scale Engine

Report: Qualcomm May Return to Data Center AI Chip Market

August 1, 2025 by staff

Qualcomm, the $39 billion chip design company, may return to the data center chip market, according to a story in The Register. “Qualcomm has teased a serious run at the data center market for years and in May CEO Cristiano Amon told the Computex ….

Filed Under: AI News, CPUs, GPUs, FPGAs, Featured, Machine Learning, News Tagged With: AI inference, ARM, Data Center, data center chips, Qualcomm

GigaIO Raises $21M in Series B Round

July 17, 2025 by staff

Carlsbad, California, July 17 2025 – GigaIO, a provider of scalable infrastructure for AI inferencing, today announced it has raised $21 million in the first tranche of its Series B financing. The round was led by Impact Venture Capital, with participation from CerraCap Ventures, G Vision Capital, Mark IV Capital, and SourceCode Cerberus. The new […]

Filed Under: AI News, Machine Learning, News Tagged With: AI, AI inference, AI inferencing, AI infrastructure, artificial intelligence, GigaIO

FriendliAI Partners with NVIDIA on Nemotron 3 for Agentic AI Inference

Red Hat Expands Inference Collaboration with AWS AI chips

Deploying Inference Services Webinar at ALCF Dec. 3

d-Matrix and Andes Collaborate on RISC-V Accelerator for AI Inference

Qualcomm Unveils Rack-Scale AI Inference Chips

IBM and Groq Partner on Enterprise AI Deployment on watsonx Orchestrate

Red Hat AI 3 Announced for Distributed AI Inference

Cerebras Reports 3,000 Tokens Per Second Inference on OpenAI gpt-oss-120b Model

Report: Qualcomm May Return to Data Center AI Chip Market

GigaIO Raises $21M in Series B Round

Sponsored Guest Articles

CPC and the Connector’s Critical Role in the Liquid Cooled AI Data Center

White Papers

NVIDIA InfiniBand Adaptive Routing Technology

More News from insideAI News