Google Releases DiffusionGemma, an Open-Weight Diffusion Text Model at 500+ Tokens/Second
Published 2026-06-10Foundation ModelsMedium⭐ Timeline Candidate
Summary
Google released DiffusionGemma (`google/diffusiongemma-26B-A4B-it`) under an Apache 2.0 license — an open-weight productization of its previously shelved Gemini Diffusion research. Simon Willison measured at least 500 tokens/second running through NVIDIA's free NIM cloud API (the experimental version had peaked around 857 tok/s). Diffusion-based text generation is notable because it produces tokens in parallel rather than strictly left-to-right, which can deliver dramatically lower latency for c
Alignment: New signal not yet covered
Related Positions: multi-model-multi-vendor, ai-infrastructure-strategy
googlediffusiongemmaopen-weightsdiffusion-modelinference-speedapache-2nvidia-nimgemma