MegaTrain Enables Full Precision Training of 100B+ Parameter LLMs on a Single GPU

Published 2026-04-08AI Infrastructure and ComputeMedium

Summary

Researchers have introduced MegaTrain, a memory-centric system that enables full-precision training of large language models with over 100 billion parameters on a single GPU. The system fundamentally inverts the traditional GPU-centric training paradigm by storing parameters and optimizer states in host (CPU) memory and treating GPUs as transient compute engines. For each layer, parameters are streamed in and gradients are computed and offloaded, minimizing persistent device-side state. To addr

Alignment: New signal not yet covered

Related Positions: ai-infrastructure-strategy.md

llm-trainingsingle-gpumemory-offloadingai-infrastructuregpu-computelarge-language-modelstraining-efficiencycpu-gpu-bandwidthresearch-papercost-reduction