Inception Launches Mercury 2: Diffusion LLM Hits 1,009 Tokens per Second, 5x Faster Than Speed-Optimized Models
Published 2026-02-24Ingested 2026-02-25Foundation ModelsHigh⭐ Timeline Candidate
Summary
Inception, a startup founded by Stanford, UCLA, and Cornell researchers behind foundational diffusion model work, launched Mercury 2 on February 24, 2026 — a production-grade reasoning LLM built on a diffusion-based architecture rather than the autoregressive token-by-token generation used by GPT, Claude, and Gemini. Mercury 2 achieves approximately 1,009 tokens per second output throughput, compared with roughly 89 tokens per second for Claude 4.5 Haiku Reasoning and 71 tokens per second for GP
Radar Context
Claude Code
Alignment: New signal not yet covered
inceptionmercury-2diffusion-llmfoundation-modelsinferencethroughputspeedagenticcostarchitecture