Stanford Researchers Extract Near-Complete Copyrighted Books From Production LLMs
Published 2026-01-16AI Regulation and GovernanceHigh
Summary
Stanford University researchers published findings (originally released January 6, 2026, with continued coverage through mid-January) demonstrating that four major production language models — Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3 — can reproduce large portions of copyrighted books despite safety measures designed to prevent memorization. In the most extreme case, Claude 3.7 Sonnet reproduced 95.8% of Harry Potter and the Sorcerer's Stone nearly verbatim. Gemini 2.5 Pro yielded
Alignment: Reinforces current position
Related Positions: ai-governance-and-risk.md
copyrightmemorizationstanfordllm-safetyclaudegptgeminigrokintellectual-propertylegal-risk