Qualcomm Unveils Rack-Scale AI Inference Chips

Qualcomm AI Rack

Qualcomm Technologies today announced the launch of its AI inference solutions for data centers: the Qualcomm AI200 and AI250 chip-based accelerator cards, and racks.

The AI200 and AI250 are expected to be commercially available in 2026 and 2027 respectively.

Building off the company’s NPU technology, these solutions offer rack-scale performance and memory capacity designed for fast generative AI inference at high performance per dollar per watt — “marking a major leap forward in enabling scalable, efficient, and flexible generative AI across industries,” the company said.

Qualcomm AI200 introduces a rack-level AI inference solution designed to deliver low total cost of ownership and optimized performance for large language & multimodal model (LLM, LMM) inference and other AI workloads. It supports 768 GB of LPDDR per card for memory capacity.

The Qualcomm AI250 solution will debut with a memory architecture based on near-memory computing, “providing a generational leap in efficiency and performance for AI inference workloads by delivering greater than 10x higher effective memory bandwidth and much lower power consumption. This enables disaggregated AI inferencing for efficient utilization of hardware while meeting customer performance and cost requirements,” Qualcomm said.

Both rack solutions feature direct liquid cooling, PCIe for scale up, Ethernet for scale out, confidential computing for secure AI workloads, and a rack-level power consumption of 160 kW.

“With Qualcomm AI200 and AI250, we’re redefining what’s possible for rack-scale AI inference. These innovative new AI infrastructure solutions empower customers to deploy generative AI at unprecedented TCO, while maintaining the flexibility and security modern data centers demand,” said Durga Malladi, SVP & GM, Technology Planning, Edge Solutions & Data Center, Qualcomm. “Our rich software stack and open ecosystem support make it easier than ever for developers and enterprises to integrate, manage, and scale already trained AI models on our optimized AI inference solutions. With seamless compatibility for leading AI frameworks and one-click model deployment, Qualcomm AI200 and AI250 are designed for frictionless adoption and rapid innovation.”

Qualcomm said hyperscaler-grade AI software stack, which spans end-to-end from the application layer to system software layer, is built for AI inference. The stack supports leading machine learning frameworks, inference engines, generative AI frameworks, and LLM / LMM inference optimization techniques like disaggregated serving, according to the company.

Developers can deploy Hugging Face models via Qualcomm’s Efficient Transformers Library and Qualcomm AI Inference Suite.