Legal Dispute Over Copyrighted Materials in AI Training Models

Introduction

The ongoing legal dispute in the Northern District of California involves significant questions about the use of copyrighted materials in training AI models. The case primarily concerns Anthropic PBC’s alleged improper acquisition of copyrighted works for its AI assistant, Claude [1] [3] [4], and the broader implications for copyright law and AI technology.

Description

In the ongoing legal dispute in the Northern District of California, the parties are contesting the production of datasets [2], with Anthropic disclosing that it invested tens of millions of dollars to create its own scanned books dataset [2]. The case centers on allegations that Anthropic PBC improperly acquired copyrighted works by downloading them from websites offering pirated content for training its AI assistant [3] [4], Claude [1] [3] [4]. This raises significant legal questions about the implications of such acquisition methods [3], particularly regarding the potential harm to the market for copyrighted works. AI models can generate outputs that may substitute for the original works used in their training data [5], leading to lost sales and market dilution, even if the generated outputs are not substantially similar to specific copyrighted works [5].

As Judge Alsup considers Anthropic’s motion for summary judgment regarding fair use [2], he has indicated a potential legal distinction: while the training of a large language model (LLM) on copyrighted material may violate the Copyright Act, the subsequent use of that material could be considered fair use [3] [4]. This suggests a separation between the acquisition of training data and the outputs generated by the LLM [3] [4]. The judge noted that LLMs do not store training materials but instead use them to adjust model weights and improve performance [3], which may support a fair use defense [3]. However, the manner of data acquisition remains critical [3], as improper acquisition—such as unauthorized downloading—could still lead to liability under copyright law [4].

Judges Chhabria and Alsup have expressed concerns regarding the unauthorized use of these pirated datasets [1], yet both appear to lean towards the view that the defendants’ use may be considered transformative in the context of developing AI technology [1]. This highlights the importance of how training data is sourced [4], suggesting that LLM developers may need to implement stricter licensing or auditing processes for their training datasets [4].

Two related cases, Kadrey v [1] [5]. Meta and Bartz v [1] [5]. Anthropic [1] [2] [3] [4] [5], are currently pending in court and may be influenced by a recent report that could affect how these cases are resolved and future AI litigation. Following a recent hearing [4], the judge requested additional briefing on a relevant Second Circuit decision that recognized certain transformative uses of copyrighted material as fair use [3] [4], indicating that the court is considering whether similar reasoning could apply to the training of LLMs [3] [4]. The parties involved may not have the opportunity to respond to the analysis presented in the report [5], which could be introduced to the court as supplemental authority [5]. However, it would only receive “Skidmore” deference [5], meaning it would not carry significant weight unless the court finds the arguments persuasive [5]. This situation is unusual [5], as the report could be interpreted as an amicus brief from the Copyright Office [5], despite being submitted outside the standard process [5]. The outcome of this case may have significant implications for the future of AI copyright law [4], depending on the judge’s ruling on summary judgment and the potential for further legal challenges [4].

Conclusion

The resolution of this case could set a precedent for how AI developers approach the acquisition and use of copyrighted materials. It may lead to stricter guidelines and practices for sourcing training data, impacting the development and deployment of AI technologies. The court’s decision will likely influence future litigation and the evolving landscape of AI copyright law.

References

[1] https://chatgptiseatingtheworld.com/2025/05/28/who-will-decide-fair-use-first-judge-alsup-or-judge-chhabria/
[2] https://chatgptiseatingtheworld.com/2025/06/09/bartz-v-anthropic-parties-fight-over-production-of-datasets-spreadsheet-and-books-dataset-outside-of-inspection-environment-anthropic-reveals-it-spent-tens-of-millions-of-dollars-to-compile-its-own/
[3] https://viewpoints.reedsmith.com/post/102kcyr/judge-alsup-llm-training-may-be-fair-use-but-acquisition-still-matters
[4] https://www.lexology.com/library/detail.aspx?g=9abbf9ac-af2d-42c8-bb63-8043df2b3549
[5] https://www.authorsalliance.org/2025/05/12/the-copyright-office-report-about-fair-use-in-ai-the-dismissal-of-the-register-of-copyrights-a-drama-in-three-parts/

Legal Dispute Over Copyrighted Materials in AI Training Models

You may also want to see:

Southampton UK