FriendliAI Partners with NVIDIA on Nemotron 3 for Agentic AI Inference

Redwood City, CA – FriendliAI, an AI inference platform company, announced a partnership with NVIDIA to launch the Nemotron 3 model family, available on FriendliAI’s Dedicated Endpoints. Developers can deploy Nemotron 3 models on FriendliAI’s inference platform. Highlights include: Up to 13× faster token generation via hybrid Mamba-Transformer MoE architecture and multi-token prediction (MTP) technique MoE routing […]

Red Hat Expands Inference Collaboration with AWS AI chips

Raleigh, N.C. – Red Hat today announced an expanded collaboration with Amazon Web Services (AWS) for generative AI (gen AI) on AWS with Red Hat AI and AWS AI silicon. With this collaboration, Red Hat focuses on empowering IT decision-makers with the flexibility to run high-performance, efficient AI inference at scale, regardless of the underlying […]

Deploying Inference Services Webinar at ALCF Dec. 3

Nov. 24, 2025 — The Argonne Leadership Computing Facility will hold a webinar showcasing the ALCF’s Inference Service on Wednesday, Dec. 3, 2025. Registration information can be found here. The facility’s inference services provides cloud-like access to diverse AI models—including large language models (LLMs)—on existing HPC clusters. ALCF’s Benoit Côté will demonstrate how to integrate the […]

d-Matrix and Andes Collaborate on RISC-V Accelerator for AI Inference

ST. LOUIS (SC25) — Nov 17, 2025 – Generative AI inference compute company d-Matrix and Andes Technology , a supplier of RISC-V processor cores, announced that d-Matrix has selected the AndesCore AX46MPV for its next-generation Raptor inference architecture. The companies said the collaboration represents a convergence of memory-centric computing and open-standard processor innovation for AI workloads […]

Qualcomm Unveils Rack-Scale AI Inference Chips

Qualcomm Technologies today announced the launch of its AI inference solutions for data centers: the Qualcomm AI200 and AI250 chip-based accelerator cards, and racks. The AI200 and AI250 are expected to be commercially available in 2026 and 2027 ….

IBM and Groq Partner on Enterprise AI Deployment on watsonx Orchestrate

ARMONK, N.Y. and MOUNTAIN VIEW, Calif., Oct. 20, 2025 — IBM and Groq today announced a partnership to provide access to Groq’s AI inference technology, GroqCloud, on watsonx Orchestrate. The partnership is intended to provide high-speed AI inference capabilities at a cost that helps accelerate agentic AI deployment. As part of the partnership, Groq and IBM plan to integrate […]

Red Hat AI 3 Announced for Distributed AI Inference

RALEIGH, N.C. – Oct. 14, 2025 – Red Hat today announced Red Hat AI 3 as part of its enterprise AI platform. Bringing together the latest developments of Red Hat AI Inference Server, Red Hat Enterprise Linux AI (RHEL AI) and Red Hat OpenShift AI, the platform is deisnged to simplify the complexities of AI […]

Cerebras Reports 3,000 Tokens Per Second Inference on OpenAI gpt-oss-120b Model

Cerebras Systems today announced inference support for gpt-oss-120B, OpenAI’s first open-weight reasoning model, running at record inference speeds of 3,000 tokens per second on the Cerebras AI Inference Cloud, according to ….

Report: Qualcomm May Return to Data Center AI Chip Market

Qualcomm, the $39 billion chip design company, may return to the data center chip market, according to a story in The Register. “Qualcomm has teased a serious run at the data center market for years and in May CEO Cristiano Amon told the Computex ….

GigaIO Raises $21M in Series B Round

Carlsbad, California, July 17 2025 – GigaIO, a provider of scalable infrastructure for AI inferencing, today announced it has raised $21 million in the first tranche of its Series B financing. The round was led by Impact Venture Capital, with participation from CerraCap Ventures, G Vision Capital, Mark IV Capital, and SourceCode Cerberus. The new […]