RALEIGH, N.C. – Oct. 14, 2025 – Red Hat today announced Red Hat AI 3 as part of its enterprise AI platform.
Bringing together the latest developments of Red Hat AI Inference Server, Red Hat Enterprise Linux AI (RHEL AI) and Red Hat OpenShift AI, the platform is deisnged to simplify the complexities of AI inference at scale, enabling organizations to move workloads from proofs-of-concept to production and improve collaboration around AI-enabled applications.
As enterprises move beyond AI experimentation, they face significant hurdles, including data privacy, cost control and managing diverse models. “The GenAI Divide: State of AI in Business” from the Massachusetts Institute of Technology NANDA project, highlights the reality of production AI, with approximately 95% of organizations failing to see measurable financial returns from ~$40 billion in enterprise spending.
Red Hat AI 3 focuses on directly addressing these challenges by providing a more consistent, unified experience for CIOs and IT leaders to maximize expensive, difficult-to-source hardware acceleration technologies. It makes it possible to rapidly scale and distribute AI workloads across hybrid, multi-vendor environments while simultaneously improving cross-team collaboration on next-generation AI workloads like agents, all on the same common platform. With a foundation built on open standards, Red Hat AI 3 meets organizations where they are on their AI journey, supporting any model on any hardware accelerator, from datacenters to public cloud and sovereign AI environments to the farthest edge.
As organizations move AI initiatives into production, the emphasis shifts from training and tuning models to inference, the “doing” phase of enterprise AI. Red Hat AI 3 emphasizes scalable and cost-effective inference, by building on the vLLM and llm-d community projects and Red Hat’s model optimization capabilities to deliver production-grade serving of large language models (LLMs).
To help CIOs maximize the use of expensive and limited hardware acceleration, Red Hat OpenShift AI 3.0 introduces the general availability of llm-d, which reimagines how LLMs run natively on Kubernetes. llm-d enables intelligent distributed inference, tapping the proven value of Kubernetes orchestration and the performance of vLLM, allowing organizations to:
- Lower costs and improve efficiency by leveraging disaggregated serving to deliver better performance per dollar.
- Improve response times and latency with an intelligent, inference-aware load balancer built to handle the variable nature of AI workloads.
- Deliver operational simplicity and maximum reliability with prescriptive “Well-lit Paths” that streamline the deployment and optimization of massive models at scale.
llm-d builds on vLLM, evolving it from a single-node, high-performance inference engine to a distributed, consistent and scalable serving system designed for enabling predictable performance, measurable ROI and effective infrastructure planning. All enhancements directly address the challenges of serving massive models like Mixture-of-Experts (MoE) models and handling highly variable workloads.
Red Hat AI 3 delivers a unified, flexible experience tailored to the collaborative demands of building production-ready generative AI solutions. It is designed to demystify AI and deliver tangible value by fostering collaboration and unifying workflows across teams. It offers a single platform for both platform engineers and AI engineers to execute on their AI strategy, providing the productivity and efficiency needed to scale from proofs-of-concept to production.
The platform provides a powerful alternative to costly public APIs by enabling enterprises to deploy their own self-managed gen AI platforms and become their own Model-as-a-Service (MaaS) providers for their organization’s AI developers and AI applications. This approach empowers IT teams to manage costs and address a broader range of use cases that cannot run on public AI services.
Red Hat AI 3 centralizes the entire AI lifecycle to foster collaboration and streamline workflows. AI hub provides platform engineers with a single point for the control, lifecycle, and governance of all AI assets. For AI engineers, the gen AI studio offers a hands-on environment to interact with models, tune parameters, and prototype new AI applications.
The curated catalog further enhances this collaborative environment by helping teams manage a wide range of AI assets, including any model they choose. To ensure reliability, the catalog also provides a selection of Red Hat validated and optimized models that are tested to run on a consistent foundation. This curated selection includes leading open-source models like OpenAI’s gpt-oss, DeepSeek-R1, and specialized models such as Whisper (speech-to-text) and Voxtral Mini for voice-enabled agents. The catalog also offers compressed and optimized versions for running large models on fewer GPUs without sacrificing quality, giving users the flexibility to choose the models that best fit their unique needs.
AI agents are poised to transform how applications are built, and their complex, autonomous workflows will place heavy demands on inference capabilities. The Red Hat OpenShift AI 3.0 release continues to lay the groundwork for scalable agentic AI systems not only through its inference capabilities but also with new features and enhancements focused on agent management.
To accelerate agent creation and deployment, Red Hat has introduced a Unified API layer based on Llama Stack, which helps align development with industry standards like OpenAI. Additionally, to champion a more open and interoperable ecosystem, Red Hat is an early adopter of the Model Context Protocol (MCP), a powerful, emerging standard that streamlines how AI models interact with external tools—a fundamental feature for modern AI agents.
“As enterprises scale AI from experimentation to production, they face a new wave of complexity, cost and control challenges,” said Joe Fernandes, VP/GM, AI Business Unit, Red Hat. “With Red Hat AI 3, we are providing an enterprise-grade, open source platform that minimizes these hurdles. By bringing new capabilities like distributed inference with llm-d and a foundation for agentic AI, we are enabling IT teams to more confidently operationalize next-generation AI, on their own terms, across any infrastructure.”




