Inception Launches Mercury 2: Diffusion LLM Hits 1,009 Tokens per Second, 5x Faster Than Speed-Optimized Models

Published 2026-02-24Ingested 2026-02-25Foundation ModelsHigh⭐ Timeline Candidate

Summary

Inception, a startup founded by Stanford, UCLA, and Cornell researchers behind foundational diffusion model work, launched Mercury 2 on February 24, 2026 — a production-grade reasoning LLM built on a diffusion-based architecture rather than the autoregressive token-by-token generation used by GPT, Claude, and Gemini. Mercury 2 achieves approximately 1,009 tokens per second output throughput, compared with roughly 89 tokens per second for Claude 4.5 Haiku Reasoning and 71 tokens per second for GP

Radar Context

Claude Code

Alignment: New signal not yet covered

inceptionmercury-2diffusion-llmfoundation-modelsinferencethroughputspeedagenticcostarchitecture