FriendliAI Partners with NVIDIA on Nemotron 3 for Agentic AI Inference

December 19, 2025 by staff

Redwood City, CA – FriendliAI, an AI inference platform company, announced a partnership with NVIDIA to launch the Nemotron 3 model family, available on FriendliAI’s Dedicated Endpoints. Developers can deploy Nemotron 3 models on FriendliAI’s inference platform.

Highlights include:

Up to 13× faster token generation via hybrid Mamba-Transformer MoE architecture and multi-token prediction (MTP) technique
MoE routing for reduced compute load and real-time latency
Leading accuracy on SWE Bench, GPQA Diamond, AIME 2025, Humanity Last Exam, IFBench, RULER, and Arena Hard
Fully open weights, datasets, and recipes for maximum transparency and control

“The combination of NVIDIA’s Nemotron 3 Nano and FriendliAI’s platform represents a milestone in unlocking the promise of agentic AI,” said Byung-Gon Chun, Founder and CEO of FriendliAI. “Efficient, affordable inference is fundamental to deploying agentic AI at scale, and our commitment to performance and scalability makes that possible.”

NVIDIA’s Nemotron 3 is a family of reasoning models designed for agentic AI and reasoning-intensive applications in fields, such as software development, retail, finance, and cybersecurity. The fully open, small language MoE model is purpose-built to deliver exceptional reasoning performance while maintaining the efficiency required for production use.

Inference speed is crucial for agentic AI because it enables real time interaction, scalability and cost efficiency.

The company said running Nemotron 3, FriendliAI delivers:

Faster performance with optimized GPU kernels
More efficient MoE serving with Online Quantization + Speculative Decoding
Predictable latency and autoscaling for traffic spikes
50 percent+ GPU cost savings on Dedicated Endpoints
OpenAI-compatible APIs for easy integration

“The combination of cost efficiency and speed has positioned FriendliAI as a compelling solution for enterprises seeking to optimize their AI infrastructure investments,” added Chun.

FriendliAI Partners with NVIDIA on Nemotron 3 for Agentic AI Inference

Sponsored Guest Articles

Accelerating Breakthroughs in Higher Education & Research with NVIDIA RTX PRO™ 6000 Blackwell Server Edition

White Papers

Mastering the Complexities of AI at Scale

More News from insideAI News

FriendliAI Partners with NVIDIA on Nemotron 3 for Agentic AI Inference

Sponsored Guest Articles

Accelerating Breakthroughs in Higher Education & Research with NVIDIA RTX PRO™ 6000 Blackwell Server Edition

White Papers

Mastering the Complexities of AI at Scale

Join Us On Social Media

More News from insideAI News