ITBench-AA: Frontier Models Score Below 50% on First Agentic Enterprise-IT Benchmark

Published 2026-05-27Ingested 2026-05-29Agentic AIMedium

Summary

Artificial Analysis and IBM's Software Innovation Lab published ITBench-AA on May 27, 2026 — billed as the first benchmark for agentic enterprise IT work. It scores models on Site Reliability Engineering (SRE) incident diagnosis: agents read alerts, events, traces, metrics, logs, and topology from sandboxed Kubernetes incident snapshots, then submit a structured list of root-cause entities within a 100-turn shell-access limit, across 59 tasks (40 public, 19 held-out) scored by average precision

Alignment: New signal not yet covered

Related Positions: Agentic Workflows, Enterprise AI Delivery, AI Governance and Risk

Related Partnerships: Anthropic (Claude)

benchmarkagentic-aienterprise-itsreevaluationibmartificial-analysisopen-weightsagent-readiness