US Copyright Office: AI Training Not Categorically Fair Use

Summary

The US Copyright Office released a pre-publication version of Part 3 of its AI and copyright series, concluding that the use of copyrighted works to train AI systems is not categorically exempt as fair use. The report found that commerciality and market harm weigh heavily against AI developers in a standard four-factor fair use analysis, and that training on pirated datasets substantially undermines any fair use defense. The Office declined to recommend a compulsory licensing regime, instead calling for scalable voluntary and collective licensing solutions.

What Happened

Part 3 completed the Office's core analytical work on AI and copyright, addressing the question that had been generating the most litigation: does ingesting copyrighted material to train an AI model constitute fair use? The Office applied the standard four-factor framework and found no categorical answer, but the weight of its analysis cut against the industry's broad fair use claims.

On the first factor — purpose and character of the use — the Office found that the commercial nature of frontier AI development weighed against fair use, rejecting the argument that training is inherently transformative in a legally meaningful sense. On the fourth factor — effect on the potential market for the original work — the Office found substantial market harm where AI outputs could substitute for licensed content, undermining existing and emerging licensing markets. The report identified the use of openly pirated datasets such as Books3 and LibGen as particularly damaging to fair use arguments, noting that knowingly training on infringing source material could not benefit from a defense designed to protect good-faith uses.

The report explicitly rejected a compulsory licensing model, which some AI developers had proposed as a compromise. Instead, it recommended voluntary licensing frameworks and encouraged the development of collective licensing mechanisms analogous to those used in the music industry, where blanket licenses enable large-scale lawful use at manageable transaction costs.

Why It Matters

Part 3 is the most consequential policy statement yet in the AI training data debate. While it carries no binding legal authority, the Copyright Office's analysis is highly persuasive to courts and Congress alike. Its conclusion that training is not categorically fair use — combined with its specific analysis of how piracy in training datasets undermines any defense — substantially narrows the safe harbor AI developers had been assuming. The voluntary licensing recommendation shifts the policy debate from whether AI companies must pay rights holders to how those payments should be structured and collected.

§ How to read the metadata

Landmark: Fundamentally alters the trajectory; 2–5 per year.
Major: Meaningfully shifts the landscape; 2–4 per month.
Notable: Worth documenting; significance can be upgraded later.
Confidence: High = primary sources corroborate. Medium = credible secondary only. Low = provisional. Disputed = credible sources disagree.
Contestation: Uncontested = no formal challenge. Contested = at least one challenge open. Superseded = replaced by a later entry. Unresolved = dispute still open.

US Copyright Office: AI Training Not Categorically Fair Use

Summary

What Happened

Why It Matters

References

See also