AI compute company Cerebras Systems today announced what it said is the fastest AI inference solution. Cerebras Inference delivers 1,800 tokens per second for Llama3.1 8B and 450 tokens per second for Llama3.1 70B, according to the company, making it 20 times faster than GPU-based solutions in hyperscale clouds.
Cerebras Claims Fastest AI Inference
Filed Under: Compute, HPC Hardware, Machine Learning, News Tagged With: AI inference, Artificial Analysis, Cerebras, CS-3, DeepLerning.AI, LLMs, Meta, Meta Llama, Wafer Scale Engine