Streaming Experts Technique Enables Trillion-Parameter MoE Models on Consumer Hardware
Published 2026-03-24Ingested 2026-03-25AI Infrastructure and ComputeMedium
Summary
Simon Willison highlights rapid progress in the "streaming experts" technique, which allows Mixture-of-Experts (MoE) models far larger than available RAM to run on consumer hardware by streaming only the necessary expert weights from SSD for each token processed. Dan Woods initially demonstrated running the Qwen3.5-397B-A17B model (397 billion parameters, 17 billion active) in just 48GB of RAM. Within five days, community member @seikixtc reported running the Kimi K2.5, a model approaching 1 tri
Alignment: New signal not yet covered
Related Positions: ai-infrastructure-strategy.md, multi-model-multi-vendor.md
mixture-of-expertsstreaming-expertsmodel-inferenceconsumer-hardwaressd-offloadingopen-weight-modelsqwenkimi-k2ram-optimizationai-infrastructure