Andon Labs: Reality as the Final Eval for Long-Horizon AI Agents

Published 2026-06-04Ingested 2026-06-05Agentic AIHigh

Summary

Swyx and Vibhu hosted Andon Labs founders Lukas Petersson and Axel Backlund on the June 4 Latent Space episode to discuss their research methodology: evaluating AI agents in real-world operational environments rather than compressed benchmark tests. Andon Labs — emerging from dangerous capability assessments for frontier AI labs — has deployed Claude-powered AI agents to operate real businesses: a San Francisco bookstore (Luna, with a three-year lease and human employees) and a Swedish vending m

Radar Context

Claude Code

Alignment: Reinforces current position

Related Positions: agentic-workflows, ai-governance-and-risk

Related Partnerships: Anthropic (Claude)

andon-labsreal-world-evalsagentic-aibenchmark-saturationemergent-behavioreval-awarenesslong-horizon-agentsai-safetylatent-spaceenterprise-agentic