Stanford Researchers Extract Near-Complete Copyrighted Books From Production LLMs

Published 2026-01-16AI Regulation and GovernanceHigh

Summary

Stanford University researchers published findings (originally released January 6, 2026, with continued coverage through mid-January) demonstrating that four major production language models — Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3 — can reproduce large portions of copyrighted books despite safety measures designed to prevent memorization. In the most extreme case, Claude 3.7 Sonnet reproduced 95.8% of Harry Potter and the Sorcerer's Stone nearly verbatim. Gemini 2.5 Pro yielded

Alignment: Reinforces current position

Related Positions: ai-governance-and-risk.md

copyrightmemorizationstanfordllm-safetyclaudegptgeminigrokintellectual-propertylegal-risk