Introduction

Recent court rulings in the Northern District of California have profound implications for the intersection of AI training and copyright law. These decisions [1] [2] [4] [5] [6] [7] [8], particularly in the cases of Richard Kadrey [8], et al [8]. v. Meta Platforms [1] [7] [8], Inc and Andrea Bartz [8], et al [8]. v. Anthropic PBC [1] [5] [7] [8], explore the application of the fair use doctrine to the use of copyrighted books in training large language models (LLMs). The rulings suggest that such use may be considered “transformative,” potentially shielding AI developers from liability. However, these decisions are not comprehensive legal guidance and highlight the complexity of fair use in the context of AI.

Description

Recent court rulings from the Northern District of California have significant implications for AI training and copyright law, particularly in the cases of Richard Kadrey [8], et al [8]. v. Meta Platforms [1] [7] [8], Inc and Andrea Bartz [8], et al [8]. v. Anthropic PBC [1] [5] [7] [8]. Federal judges determined that using copyrighted books to train large language models (LLMs) may be considered “transformative” under the fair use doctrine [4], potentially shielding AI developers from liability when repurposing creative works for model training [4]. However, these rulings are limited and should not be interpreted as comprehensive legal guidance [4].

In the case against Anthropic [1] [6], Judge William Alsup found that the company’s AI training qualified as transformative use [6], likening it to human learning and suggesting it was an acceptable form of use under fair use criteria. Conversely [1] [6] [8], Judge Vincent Chhabria [1] [6], ruling on the Meta case just 48 hours later [6], emphasized the distinct differences between human learning and AI training [6], arguing that Meta’s practices could not be equated with human processes [6]. Both courts acknowledged the significant creative expressive value of the works used in AI training [6], which supports the position of AI companies [6], but their assessments of potential market harm were notably simplistic [6], with minimal exploration of possible market losses due to AI training [6].

Both courts found that the use of legally purchased copyrighted books for training LLMs is transformative [8], significantly altering their purpose [2], even when the full text is utilized [8]. They acknowledged the creative nature of the works but concluded that the purpose of AI training—learning language patterns to generate new content—was crucial in establishing transformative use under the fair use analysis [8]. The courts emphasized that if the AI-generated outputs do not infringe upon or substitute for the original works and contribute to the development of innovative tools [7], such use may be deemed fair [7]. Federal judges in California have partially favored AI companies like Anthropic and Meta in lawsuits from authors [5], indicating that the use of copyrighted books for training generative AI models [5], such as Claude and Llama 2, may qualify as permissible fair use under US copyright law [5]. However, these rulings do not grant AI developers unrestricted rights regarding content acquisition methods [5].

The courts diverged on key aspects, particularly regarding data acquisition and market harm [8]. In the Bartz case, the court analyzed the AI training process in two stages: data collection and model training [8]. It found that while the transformation of legally purchased hardbacks was fair use [8], the use of over 7 million pirated books was deemed a copyright violation [2], with the court rejecting any fair use claim for these materials [2]. This distinction emphasizes that only lawfully obtained content can be used under fair use principles [2]. Conversely [1] [6] [8], the Kadrey court did not separate the analysis into distinct stages and focused solely on the transformative nature of Meta’s use of the data [8], leading to a fair use finding without delving into how the data was acquired [8].

The judges also differed in their assessment of market harm [8]. The Kadrey court expressed concern that AI-generated works could compete with original works [8], emphasizing that market harm is a critical factor in fair use [8]. However, it ultimately ruled in favor of Meta due to a lack of evidence from the plaintiffs regarding market dilution [8]. In contrast [8], the Bartz court dismissed claims of market harm [8], viewing AI-generated works as mere competition rather than direct substitutes for the original works [8]. The transformative nature of the AI training process was emphasized [5], as it aimed to enable models to generate new [5], unrelated text rather than reproduce the original content of the books used [5].

Both courts noted that the AI systems were trained to minimize the risk of infringing output [9], underscoring the importance of mitigation efforts for AI developers [9]. The manner of acquiring copyrighted works for training was also addressed [9], with the courts agreeing that it affects fair use analysis [9], though the extent of its impact remains uncertain [9]. In the case of unlawfully obtained materials [9], one developer’s transformative purpose was weighed less heavily, while another case involved lawfully purchased materials [9], which were deemed fair use [9].

The rulings also introduced a new market dilution theory that could challenge fair use defenses in future litigation [1]. Judge Chhabria characterized Meta’s Llama LLM as “highly transformative,” while Judge Alsup praised Anthropic’s Claude chatbot for its significant transformative qualities [1]. They noted that the effectiveness of LLMs relies on the breadth and quality of the training data [1], which justified the use of full books [1].

Despite ruling in favor of fair use [1], both judges acknowledged that the AI companies had accessed the books through unauthorized means and were aware of the associated legal risks [1]. Chhabria raised the possibility of Meta facing liability for using pirated libraries [1], while Alsup indicated that Anthropic’s use of such libraries for purposes beyond LLM training could support a separate infringement claim [1]. The judges highlighted that there was no evidence of the LLMs generating outputs that were substantially similar to the authors’ works [1], which played a crucial role in their fair use determinations [1].

These rulings indicate that the legality of the source of copyrighted materials is a relevant factor in fair use analysis [9], but its significance is still being determined [9]. The US Copyright Office (USCO) has emphasized that fair use determinations are fact-specific [3], particularly considering transformative use and market effects of copyrighted works [3]. AI models used for noncommercial purposes that do not reproduce copyrighted material may favor fair use [3]. However, models trained on pirated data or whose outputs compete with original works could face legal challenges [3]. The USCO differentiates between benign internal uses and high-risk practices that involve copying expressive works from unauthorized sources [3], especially when licensing options are available [3].

Ongoing litigation suggests that further developments are expected [9]. Companies utilizing large datasets for training should prioritize compliance strategies and vetting of data sources to minimize infringement risks [9]. Additionally, without clear visibility into data sources [9], companies may face liability if the necessary rights to training data are not secured [9], highlighting the need for robust contract provisions in licensing agreements [9]. Rights holders [8], including novelists [8], visual artists [8], and music publishers [2] [8], may find it easier to argue against AI-generated outputs [8], suggesting that the landscape of fair use in AI is still evolving and fraught with uncertainties [8]. Future legal challenges may focus on output infringement [8], particularly concerning LLM-generated content that closely mirrors training materials [8], as generative AI continues to evolve and prompt further legal scrutiny.

Conclusion

The recent rulings underscore the evolving nature of fair use in the context of AI, highlighting the importance of transformative use and the legality of data acquisition. While the decisions offer some protection to AI developers, they also emphasize the need for careful consideration of market harm and the source of training data. As the legal landscape continues to develop, AI companies must remain vigilant in their compliance efforts to mitigate potential legal risks.

References

[1] https://news.bloomberglaw.com/us-law-week/big-tech-wins-in-copyright-cases-come-with-strings-attached
[2] https://technologicinnovation.com/2025/07/12/claude-ai-ruling-2025-fair-use-or-violation/
[3] https://www.thelawverse.com/p/fair-use-llm
[4] https://www.lexology.com/library/detail.aspx?g=b04952f6-db62-49d8-91a1-3a2cf1726bf4
[5] https://www.acc.com/resource-library/ai-and-fair-use-two-us-court-decisions-shaping-landscape
[6] https://www.aibase.com/news/19554
[7] https://natlawreview.com/article/ai-vs-authors-two-california-judges-two-directions-and-more-uncertainty-fair-use
[8] https://www.ecjlaw.com/ecj-blog/a-temporary-victory-what-the-new-anthropic-and-meta-rulings-actually-reveal-about-a-fair-use-defense-for-companies-accused-of-using-copyrighted-works-to-train-generative-ai-by-jason-l-haas-and-banu-naraghi
[9] https://www.jdsupra.com/legalnews/concerned-about-ai-training-data-and-4042777/