Lambda Launches Inference API

Dec. 13, 2024 — AI company Lambda today announced its Inference API, which the company said enables access to LLMs through a serverless AI for “a fraction of a cent.”

The company said Lambda Inference API offers low-cost, scalable AI inference with such models as Meta’s recently released Llama 3.3 70B Instruct (FP8) at $0.20 per million input and output tokens. Lambda said its pricing is less than half the cost of most competitors.

Lambda AI Inference includes “Core” models, which are selected for stability and long-term support, or “Sandbox” models that provide access to the latest innovations with more frequent updates. The API scales to handle workloads of any size and integrates with OpenAI-style endpoints, simplifying implementation, according to the company.

“But let’s face it: deploying AI at scale is no easy feat,” the company stated in its announcement blog. “It requires massive amounts of compute, significant expertise in MLOps to set everything up and performance tune it, as well as a hefty budget to keep it all running smoothly.”

The Inference API includes an inference stack built for AI, including Meta’s Llama 3.1 405B at $0.90 per million tokens, pay-per-token without rate limits and dynamic scaling, allowing users to “scale without worrying about infrastructure bottlenecks,” Lambda said.