Introduction

The article “Fair Use and the Origin of AI Training” critically examines the US Copyright Office’s Pre-Publication Report on AI Training [4]. This report delves into the intricate legal issues surrounding copyright law in the context of generative AI, particularly focusing on the implications for companies that use copyrighted works to train AI models [3]. The article highlights the report’s lack of definitive clarity on fair use claims by AI companies and explores the potential for copyright infringement claims.

Description

The article “Fair Use and the Origin of AI Training” critiques the US Copyright Office’s Pre-Publication Report on AI Training [4], which addresses the legal complexities surrounding copyright law issues in the context of generative AI and the implications for companies that utilize copyrighted works to train AI models. The report does not provide definitive clarity on whether AI companies can claim fair use for copyrighted content [5], leaving room for potential copyright infringement claims [5]. It indicates potential agreement with creators who argue that using their work for AI training may constitute copyright infringement [6], particularly as Big Tech companies draw from a diverse array of creators, including scientists [6], journalists [6], filmmakers [6], and artists [2] [6] [7], to enhance their AI systems [6].

The report emphasizes the need for high-quality content while questioning the volume of data necessary for effective model training [7], noting that data curation practices vary among developers [7]. It argues that companies may infringe on copyright protections when they copy materials for training purposes [3], especially if the numerical parameters of AI models can reproduce copyrighted works as memorized examples [3]. While AI companies assert that increased data leads to improved models [6], this raises concerns about potential violations of copyright laws [6], leading to lawsuits against companies like OpenAI by creators claiming unauthorized use of their copyrighted works [6].

In a recent hearing regarding a case involving Meta AI [2], Judge Vince Chhabria expressed skepticism about whether AI training can be classified as fair use [2], as Meta faces allegations of copyright infringement from book authors [2]. Meta contends that AI training should be considered fair use [2], arguing that it transforms existing works into new creations without replicating the original ideas or replacing the authors’ market [2]. However, Chhabria challenged this argument, indicating a lack of understanding of how AI training could be deemed fair use and questioning the validity of Meta’s claims. He emphasized the importance of potential market harms, suggesting that the outcome may hinge on whether authors can substantiate claims of financial loss due to AI training [2].

The article contends that the report’s endorsement of a new theory of market dilution improperly expands copyright protection [4], shielding copyright holders from competition posed by new [4], noninfringing works [4]. This transformation of copyright from a limited monopoly into a general monopoly against competition undermines the public benefit of competition [4], as established in Sega Enterprises v [4]. Accolade [4]. It emphasizes that unless a copyright holder can prove infringement [4], competing works should be encouraged [4].

Furthermore, the article argues that protecting copyright holders from competition in entire genres contradicts the purpose of copyright [4], which is to promote progress [4]. While the fair use defense allows for some copying of copyrighted works [3], the report suggests that many uses by AI models may be considered transformative [3], particularly when training on diverse datasets. However, it concludes that commercializing copyrighted works in training data to compete with original works is unlikely to qualify for the fair use exception [3], as this could dilute the market for the original works [3].

The report highlights the complexity of determining fair use [6], noting that the assessment will depend on multiple factors, including the purpose of the use [5], the nature of the works [5] [6], the amount used [5], and the market effect [5]. It challenges the notion that AI training is inherently transformative [6], emphasizing that it does not equate to human learning and is not solely for expressive purposes [6]. The expansive view of copyright potentially violates the Progress Clause and the First Amendment rights of those using AI generators to create new [4], noninfringing expression [4]. The article asserts that as long as new expression does not infringe on original works [4], it is protected speech under the First Amendment [4].

In its evaluation of fair use [7], the article highlights the importance of the purpose and character of the use of copyrighted works, drawing on the US Supreme Court’s decision in Andy Warhol Foundation for the Arts v [7]. Goldsmith (2023) [7]. It concludes that reducing the production of noninfringing works due to speculative fears of market dilution contradicts the principles of free expression that copyright should support [4]. The potential market harm from AI training is significant [1], as models can create outputs that directly substitute for copyrighted works [1], leading to lost sales [1]. Even non-substantially similar outputs can dilute the market for similar works [1], impacting their visibility and sales [1].

The future of the report remains uncertain [3], particularly following the termination of US Copyright Office director Shira Perlmutter by President Donald Trump [6], which raised suspicions regarding the timing of this decision in relation to the report’s findings [6]. The timing of the report itself raises concerns, as it may influence court decisions in ongoing AI-related cases without allowing for responses from involved parties [1]. Companies involved in AI development should take its findings into account in their operations [3], as the report could be presented as supplemental authority in court [1], albeit entitled to limited deference [1]. Licensing agreements are presented as a viable alternative to litigation [5], with some publishers already negotiating deals with AI companies for access to their content [5]. However, concerns arise that reliance on licensing could favor larger tech companies [5], potentially marginalizing smaller developers [5]. The report concludes that these market dynamics should be addressed through antitrust laws rather than affecting fair use determinations [5].

Conclusion

The article underscores the significant implications of the US Copyright Office’s report on AI training, particularly in the realm of copyright law and fair use. It highlights the potential for increased litigation and market disruption, emphasizing the need for clear legal guidelines. The report’s findings could influence ongoing and future court cases, and companies in the AI sector must consider these implications in their operations. The article suggests that while licensing agreements offer a potential solution, they may disproportionately benefit larger companies, necessitating consideration of antitrust measures to ensure fair competition.

References

[1] https://www.authorsalliance.org/2025/05/12/the-copyright-office-report-about-fair-use-in-ai-the-dismissal-of-the-register-of-copyrights-a-drama-in-three-parts/
[2] https://ediscoverytoday.com/2025/05/06/judge-skeptical-that-ai-training-is-fair-use-artificial-intelligence-trends/
[3] https://natlawreview.com/article/generative-ai-training-may-not-qualify-fair-use-defense
[4] https://chatgptiseatingtheworld.com/2025/05/14/my-paper-on-fair-use-and-the-origin-of-ai-training-or-why-the-copyright-office-report-is-wrong-about-fair-use-and-why-courts-should-reject-its-view/
[5] https://www.cnet.com/tech/services-and-software/copyright-office-punts-on-ai-and-fair-use-one-of-the-biggest-questions-surrounding-gen-ai/
[6] https://www.businessinsider.com/ai-training-copyright-laws-big-tech-fair-use-openai-meta-2025-5
[7] https://ipwatchdog.com/2025/05/12/copyright-office-weighs-ai-fair-use-amidst-major-leadership-shakeup/id=188814/