Google Releases Multi-Token Prediction Drafters for Gemma 4, Delivering Up to 3x Inference Speedup

Published 2026-05-05Ingested 2026-05-08AI Infrastructure and ComputeMedium⭐ Timeline Candidate

Summary

Google released Multi-Token Prediction (MTP) drafters for the Gemma 4 model family under Apache 2.0, delivering up to 3x inference speedup without quality or reasoning degradation. The technique uses speculative decoding: a lightweight drafter model generates multiple predicted tokens while the primary Gemma 4 target model processes context, then the target model verifies all suggestions simultaneously. This allows multiple tokens to be output in the time normally required for one, addressing th

Alignment: Reinforces current position

Related Positions: AI Infrastructure Strategy, AI-Assisted Development Tooling

googlegemma-4multi-token-predictioninference-speedupspeculative-decodingopen-sourcellm-inferenceedge-deploymentai-infrastructureperformance-optimization