Kadrey v. Meta Platforms Inc: Authors Allege Mass Copyright Infringement Over AI Training Data

Introduction

The case of Kadrey v [8]. Meta Platforms Inc involves allegations of mass copyright infringement by authors against Meta, claiming the company unlawfully used copyrighted works to train its AI model, Llama [4] [5] [8]. The legal proceedings focus on the fair use defense and the potential market impact of Meta’s actions.

Description

At a recent hearing in the case of Kadrey v. Meta Platforms Inc [6] [8], authors Richard Kadrey [3], Sarah Silverman [2] [3] [5] [8], Ta-Nehisi Coates [2] [8], Andrew Sean Greer [8], and others alleged mass copyright infringement against Meta. They claim that the company unlawfully trained its generative AI model, Llama [4] [5] [8], on millions of copyrighted works [3], including their own [3] [8], by downloading these materials from unauthorized platforms and piracy networks. The case is being heard in the US District Court for the Northern District of California [3] [6], with Judge Vince Chhabria presiding [3]. Meta’s defense is centered on a fair use argument, asserting that its actions are transformative and distinct from the original copyrighted works [8], which could potentially lead to a favorable ruling for the company.

The hearing focused on summary judgment motions from both parties [7], with differing views on what facts could be decided at this stage [7]. The authors contended that only Meta’s actions related to downloading and sharing “pirated” books could be ruled as infringement [7], while Meta maintained that all aspects of the copyright claim [7], except for the alleged unauthorized distribution [7], should be considered in the context of its fair use defense [7]. Judge Chhabria recognized the potential for transformative use in AI training but emphasized that this does not automatically equate to fair use [5]. He posed challenging questions to both sides and expressed skepticism regarding Meta’s assertion that its AI training data constitutes fair use. The judge highlighted the need for the authors to provide specific evidence of market harm to support their claims, indicating that without proof of significant impact on their sales [1], their case could be undermined by fair use [1].

The discussion primarily centered on the potential market impact of Meta’s technology [7], which could lead to an oversaturation of competing works [7]. Judge Chhabria referred to this concern as an “obliteration” theory of market harm and sought clarification on whether the authors had provided sufficient evidence to support this theory [7]. The burden of proof for fair use [7], as an affirmative defense [7], was also a topic of discussion [7]. Meta’s defense reiterated that its AI training is transformative and does not replicate authors’ ideas or replace their works in the market [1]. However, the plaintiffs countered that Llama’s capabilities pose a substitution risk [4], and the scale of copying far exceeds what was upheld in previous cases like Authors Guild v [4]. Google Books [4].

Internal communications reportedly indicate that Meta engineers were aware of the legal risks associated with their actions [4], which could support claims of willful infringement and expose the company to substantial statutory damages [4]. The method of data acquisition may significantly impact the fair use analysis [4], with plaintiffs arguing that bad faith actions [4], such as torrenting from pirate libraries [4], undermine any fair use defense [4]. The judge’s inquiries suggested that a trial might be necessary to resolve these issues [7], potentially dismissing both parties’ motions for summary judgment [7]. If the main dispute revolves around market harm from the potential flooding of the book market [7], Meta could be positioned favorably [7].

In November 2023 [3], the court dismissed most of the plaintiffs’ claims but allowed them to amend their complaint to demonstrate a more direct link to actual harm [3], which they subsequently did in December 2023 [3]. If the case proceeds to trial [7], the judge will need to address motions for judgment as a matter of law afterward [7]. If he concludes that Meta’s use was highly transformative and that lost licensing does not sufficiently demonstrate market harm [7], the focus will narrow to the theory of market flooding [7]. Regarding the plaintiffs’ claims under the DMCA and issues related to file sharing [7], there was minimal discussion [7]. The judge did inquire about the implications of recognizing fair use for copies obtained from “pirated” sources [7], with arguments presented by both sides about the legitimacy of such sources and their impact on fair use analysis [7]. The judge appeared concerned about the implications of shadow libraries [7], but this issue was secondary to the primary focus on market harm [7].

Additionally, authors have urged the court to dismiss new summary judgment evidence introduced by Meta in its reply briefs, arguing that the evidence conflicts with witness testimonies regarding the company’s use of datasets containing pirated books and is untimely [6], having been submitted after the close of discovery. The plaintiffs specifically challenge a declaration regarding Meta’s use of the pirate site LibGen [3], citing a prior witness’s lack of knowledge about LibGen during deposition [3]. They also contest a new declaration from Meta’s Dr [3]. Michael Stinkinson [3], asserting it violates a court order that prohibited further expert discovery [3], arguing that Meta has misinterpreted this order [3], which specifically disallowed additional reply declarations from the plaintiffs’ experts [3].

The potential ruling in this case could reshape the AI industry’s approach to data sourcing [4], requiring companies to ensure clean sourcing for training datasets to avoid legal repercussions [4]. If the court finds that unlawful acquisition taints transformative uses [4], it may lead to stricter documentation requirements for AI training data [4]. Conversely [4], a ruling in favor of Meta could encourage more aggressive data acquisition practices [4], potentially allowing companies to use any publicly available online content for AI training [4], provided the final product is deemed transformative [4]. The judge plans to take time before making a decision [1], which could have broader implications for the AI industry regarding the legality of training practices and may set a precedent for future copyright cases involving AI systems and their training methodologies [1].

Conclusion

The outcome of Kadrey v [2]. Meta Platforms Inc could significantly influence the AI industry’s data sourcing practices. A ruling against Meta may necessitate stricter documentation and sourcing standards, while a favorable ruling for Meta could embolden more aggressive data acquisition strategies. The case’s resolution will likely have lasting implications for copyright law as it pertains to AI training methodologies.

References

[1] https://arstechnica.com/tech-policy/2025/05/judge-on-metas-ai-training-i-just-dont-understand-how-that-can-be-fair-use/
[2] https://itmagazine.com/2025/05/02/judge-highlights-metas-ai-copyright-case-as-pivotal-for-the-future-of-music-is-this-the-next-taylor-swift/
[3] https://www.mckoolsmith.com/newsroom-ailitigation-20
[4] https://www.masslawblog.com/copyright/copyright-ai-and-metas-torrent-problem/
[5] https://the-decoder.com/us-judge-questions-metas-claim-that-training-ai-on-copyrighted-books-is-fair-use/
[6] https://news.bloomberglaw.com/ip-law/meta-accused-by-authors-of-improper-ai-evidence-submission
[7] https://chatgptiseatingtheworld.com/2025/05/05/at-hearing-before-judge-chhabria-meta-appeared-one-step-away-from-prevailing-on-fair-use-defense/
[8] https://news.bloomberglaw.com/ip-law/meta-faces-copyright-reckoning-in-authors-generative-ai-case

Kadrey v. Meta Platforms Inc: Authors Allege Mass Copyright Infringement Over AI Training Data

You may also want to see:

Southampton UK