AI Companies Face Copyright Lawsuits Over Use of Shadow Libraries

Introduction

In recent legal proceedings, a significant issue has emerged concerning the use of “shadow libraries” by AI companies [3], which contain unauthorized copies of books [3]. This has led to numerous copyright lawsuits, highlighting the complex intersection of AI technology and intellectual property law [4]. These cases address the use of copyrighted materials for training generative AI models without permission, raising questions about copyright infringement [4], fair use [1] [2], and the copyrightability of AI-generated content [4].

Description

In recent copyright lawsuits against AI companies [3], a significant issue has emerged regarding the use of “shadow libraries,” which contain unauthorized copies of books [3]. These libraries have drawn criticism from authors and garnered considerable media attention [3]. Judges Chhabria and Alsup have expressed concerns about this matter during hearings for summary judgment in the cases of Kadrey v [3]. Meta and Bartz v [3]. Anthropic [3]. Kadrey v [1] [3]. Meta is particularly notable as it is the first case to address the legal implications of using copyrighted materials for training generative AI models without permission, amidst over 40 lawsuits filed by copyright owners against various AI developers since January 2023 [1].

Numerous lawsuits have surfaced concerning the use of copyrighted materials in training generative AI systems [4], with creators alleging large-scale copyright infringement [4]. Key cases include Andersen v [4]. Stability AI [2] [4], which addresses the use of art for training diffusion models [4], and Tremblay v [4]. OpenAI [4], focusing on the use of books for training language models [4]. Several of these cases have been consolidated for efficiency, including Authors Guild v. OpenAI [4], which highlights issues of unauthorized use and potential memorization of copyrighted content [4]. Additionally, Getty Images v [4]. Stability AI raises questions about the use of images and videos for AI training [4], while RIAA v. Suno and Udio pertains to the use of songs for training audio-generating models [4]. These cases reflect ongoing legal challenges regarding piracy [4], web scraping [4], and the copyrightability of generated content [4], underscoring the complex intersection of AI technology and intellectual property law [4].

Over the past year [2], many AI-related lawsuits have primarily focused on infringement claims [2], although none have yet reached a jury trial. A notable case [2], Andersen v [2] [4]. Stability AI [2] [4], has generated significant attention [2], with the plaintiffs’ lawyer describing a recent court decision as a substantial advancement for the case [2]. However, many claims in these lawsuits have struggled to survive pretrial motions to dismiss [2], which are typically challenging for plaintiffs [2].

To withstand a motion to dismiss [2], plaintiffs must present plausible claims [2], with the court assuming all factual allegations are true and interpreting them favorably [2]. Key legal questions [2], such as direct copyright infringement and fair use [2], often remain unresolved at this stage [2]. Fair use analysis involves balancing four factors [1], with the fourth factor—impact on the market for the original work—being particularly crucial [1]. This factor examines whether the new use competes with or undermines the market for the original work [1]. If AI companies can demonstrate that they would prevail as a matter of law [2], the claims may be dismissed [2]. Dismissed claims can sometimes be amended [2], as seen in Andersen v [2]. Stability AI [2] [4], where initial claims were allowed to be revised [2].

Most AI lawsuits remain in early stages [2], with recent court rulings primarily addressing defendants’ motions to dismiss [2]. A common claim regarding the removal of copyright management information (CMI) has consistently failed to survive these motions [2], as courts have found insufficient evidence of intentional removal by AI companies [2]. Additionally, claims asserting that AI models constitute derivative works of training materials have also been dismissed [2], with courts rejecting the notion that an AI model can be considered a derivative work based solely on the use of preexisting works for training [2].

Vicarious liability claims have similarly been dismissed when plaintiffs cannot demonstrate direct infringement [2]. Many non-copyright state law claims have also been dismissed due to copyright preemption [2], as illustrated in Andersen v. Stability AI [2] [4], where an unjust enrichment claim was rejected for lacking distinguishing elements from copyright rights [2]. The burden of proof lies with the defendant to demonstrate that their use does not harm the market for the original work [1], while critics argue that the “market dilution” theory misinterprets fair use [1], suggesting that allowing copyright holders to restrict AI-generated outputs would unjustly extend their monopoly [1].

Interestingly [2], many dismissed claims across various AI lawsuits exhibit similarities [2], often filed by the same law firm [2]. These complaints tend to contain broad claims and class designations that are susceptible to dismissal [2]. Further exploration of the class action aspect of these AI lawsuits is anticipated in future discussions [2], as the legal landscape continues to evolve in response to the rapid advancements in generative AI technology.

Conclusion

The ongoing legal battles between AI companies and copyright holders underscore the challenges of balancing technological innovation with intellectual property rights. As these cases progress, they will likely shape the future of AI development and its legal framework, influencing how copyrighted materials are used in training AI models. The outcomes of these lawsuits could have significant implications for both the AI industry and copyright law, potentially redefining the boundaries of fair use and copyright infringement in the digital age.

References

[1] https://www.copyhype.com/2025/05/generative-ai-copyright-and-market-harm/
[2] https://www.authorsalliance.org/2024/09/03/the-ai-copyright-hype-legal-claims-that-didnt-hold-up/
[3] https://chatgptiseatingtheworld.com/2025/06/06/scholarship-how-should-courts-weigh-ai-companies-use-of-shadow-libraries-of-pirated-books/
[4] https://aiwatch.dog/lawsuits

AI Companies Face Copyright Lawsuits Over Use of Shadow Libraries

You may also want to see:

Southampton UK