Andon Labs: Reality as the Final Eval for Long-Horizon AI Agents
Published 2026-06-04Ingested 2026-06-05Agentic AIHigh
Summary
Swyx and Vibhu hosted Andon Labs founders Lukas Petersson and Axel Backlund on the June 4 Latent Space episode to discuss their research methodology: evaluating AI agents in real-world operational environments rather than compressed benchmark tests. Andon Labs — emerging from dangerous capability assessments for frontier AI labs — has deployed Claude-powered AI agents to operate real businesses: a San Francisco bookstore (Luna, with a three-year lease and human employees) and a Swedish vending m
Radar Context
Claude Code
Alignment: Reinforces current position
Related Positions: agentic-workflows, ai-governance-and-risk
Related Partnerships: Anthropic (Claude)
andon-labsreal-world-evalsagentic-aibenchmark-saturationemergent-behavioreval-awarenesslong-horizon-agentsai-safetylatent-spaceenterprise-agentic