Claude Opus 4.6 Surpasses GPT-5.2 on GDPval-AA Agentic Knowledge Work Benchmark

Published 2026-04-08Foundation ModelsHigh⭐ Timeline Candidate

Summary

Anthropic's Claude Opus 4.6 has reportedly taken the top position on the GDPval-AA benchmark, which evaluates AI models on agentic real-world knowledge work tasks, surpassing OpenAI's GPT-5.2. The benchmark appears to focus specifically on the ability of models to perform complex, multi-step workflows representative of actual enterprise knowledge work — a category that maps directly to agentic AI capabilities in professional settings. This result is significant for organizations pursuing multi-

Alignment: Reinforces current position

Related Positions: agentic-workflows.md, multi-model-multi-vendor.md, enterprise-ai-delivery.md

Related Partnerships: anthropic-claude.md

claude-opusanthropicgpt-5openaiagentic-benchmarksgdpval-aafoundation-modelsmodel-evaluationknowledge-workmulti-model-strategy