Streaming Experts Technique Enables Trillion-Parameter MoE Models on Consumer Hardware

Published 2026-03-24Ingested 2026-03-25AI Infrastructure and ComputeMedium

Summary

Simon Willison highlights rapid progress in the "streaming experts" technique, which allows Mixture-of-Experts (MoE) models far larger than available RAM to run on consumer hardware by streaming only the necessary expert weights from SSD for each token processed. Dan Woods initially demonstrated running the Qwen3.5-397B-A17B model (397 billion parameters, 17 billion active) in just 48GB of RAM. Within five days, community member @seikixtc reported running the Kimi K2.5, a model approaching 1 tri

Alignment: New signal not yet covered

Related Positions: ai-infrastructure-strategy.md, multi-model-multi-vendor.md

mixture-of-expertsstreaming-expertsmodel-inferenceconsumer-hardwaressd-offloadingopen-weight-modelsqwenkimi-k2ram-optimizationai-infrastructure