Britannica Files Federal Lawsuit Claiming ChatGPT “Memorized” Its Encyclopedia Content
Encyclopaedia Britannica is taking legal action against OpenAI, accusing the company behind ChatGPT of copying its copyrighted reference material to train artificial intelligence systems.
The lawsuit claims OpenAI built its models using large portions of Britannica’s encyclopedia and dictionary content, then generated answers that closely resemble the original writing.
According to Reuters, the complaint filed in Manhattan federal court alleges OpenAI copied nearly 100,000 articles from Britannica and its subsidiary Merriam‑Webster while training AI models including GPT-4.
Britannica says those systems can produce responses that mirror its content, effectively replacing visits to its website and diverting readers away from its subscription and licensing platforms.
Subscribe free for daily political analysis they won’t broadcast. Join 115K+ readers →
“GPT-4 itself has ‘memorized’ much of Britannica’s copyrighted content,” the lawsuit states.
The publishers also accuse OpenAI of trademark infringement, claiming the chatbot sometimes generates responses that falsely attribute or imply permission from Britannica, potentially damaging the brand’s reputation for accuracy.
OpenAI has rejected the allegations, saying its models are trained on publicly available information and that the process qualifies as fair use under copyright law.
The dispute reflects a broader legal battle between content publishers and AI developers over how large language models gather training data. News organizations, authors, and entertainment companies have filed similar claims arguing that AI systems replicate copyrighted work without compensation.
Subscribe free for daily political analysis they won’t broadcast. Join 115K+ readers →
Britannica previously filed another lawsuit in 2025 against AI startup Perplexity AI over similar allegations involving article summaries and diverted web traffic.
The court will now determine whether training AI systems on large datasets that include copyrighted works crosses the legal line.
For both the publishing industry and the AI sector, the outcome could shape how future models are built.
Subscribe free for daily political analysis they won’t broadcast. Join 115K+ readers →



