Cerebras Reports 3,000 Tokens Per Second Inference on OpenAI gpt-oss-120b Model

SUNNYVALE, Calif. & SAN FRANCISCO — Cerebras Systems today announced inference support for gpt-oss-120B, OpenAI’s first open-weight reasoning model, running at record inference speeds of 3,000 tokens per second on the Cerebras AI Inference Cloud, according to Cerebras.

This is the first time an OpenAI model leverages Cerebras’ wafer-scale AI infrastructure to run full-model inference. Cerebras said gpt-oss-120B can be accessed on the Cerebras Cloud with a free API key (cerebras.ai/openai).

Built for math, science and code, this 120B-parameter model is on par with top proprietary models like Gemini 2.5 Flash and Claude Opus 4, Cerebras said.

“By eliminating GPU memory bandwidth bottlenecks and communication overhead, Cerebras wafer-scale AI inference delivered a world-record 3,000 tokens per second output speed — a major advance in responsiveness for high-intelligence AI,” the company said. This performance will enable organizations to use Cerebras-powered gpt-oss-120B to build live coding assistants, instant large document Q&A and summarization, and fast agentic research chains, according to Cerebras, which added that “high-intelligence AI reasoning use cases have long wait times on proprietary models running on GPUs – that lag is now dramatically reduced with gpt-oss-120B on Cerebras.”

“OpenAI’s open-weight reasoning model release is a defining moment for the AI community,” said Andrew Feldman, CEO and co-founder of Cerebras. “With gpt-oss-120B, we’re not just breaking speed records—we’re redefining what’s possible. OpenAI on Cerebras delivers frontier intelligence with blistering performance, lower cost, full openness, and plug-and-play ease of use. It’s the ultimate AI platform: smart, fast, affordable, easy to use, and fully open.”

Developers can swap their existing OpenAI endpoints for Cerebras in 15 seconds without refactoring or migration headaches.

The open-weight Apache 2.0 license from OpenAI gives users the ability to fine-tune for their domain, deploy on-prem for sensitive or regulated data, or move freely across clouds.

“Our open models let developers—from solo builders to large enterprise teams—run and customize AI on their own infrastructure, unlocking new possibilities across industries and use cases,” said Dmitry Pimenov, product lead at OpenAI. “Through deployment partners like Cerebras, we’re together able to provide powerful, flexible tools that make it easier than ever to build, innovate, and scale.”