Introduction

The ongoing debate over the use of copyrighted content by AI companies has intensified, with US Senator Josh Hawley leading a Senate Judiciary subcommittee hearing to address the issue. The hearing focused on the alleged piracy of copyrighted materials by major tech firms, particularly in training artificial intelligence models, and the implications for intellectual property rights.

Description

US Senator Josh Hawley chaired a Senate Judiciary subcommittee hearing focused on the role of Big Tech in the piracy of copyrighted content for training artificial intelligence (AI) models [2], characterizing this practice as potentially the largest instance of intellectual property theft in American history [7]. He raised concerns regarding Meta’s alleged violation of copyright laws [3], specifically its use of over 200 terabytes of published works from authors without compensation to enhance its AI capabilities [3]. Hawley emphasized the moral implications of allowing a few large corporations to profit from the creative works of others without compensation [4], criticizing AI companies for prioritizing technological advancement as a pretext for financial gain at the expense of copyright holders. This stance marked a contrast to the more lenient approach towards the tech sector previously taken by President Trump.

The discussion centered on the fair use doctrine, which has been employed by tech companies in response to copyright infringement claims [5], particularly regarding the extent to which this defense can be applied to AI companies that have downloaded large volumes of books from unauthorized sources [5]. During the hearing [7], testimony from five witnesses highlighted the implications of AI training methods on copyright law, with four witnesses arguing that these methods violate fair use [1]. However, law professor Edward Lee contended that some uses could be protected under fair use, urging Congress to allow ongoing court cases to address these complex issues [1].

In a related case [3], US District Judge William Alsup ruled that while Anthropic could assert “fair use” for some copyrighted works [3], its storage of over 7 million pirated books constituted copyright infringement [3], with a trial scheduled for December to assess the extent of Anthropic’s violations [3]. In contrast [8], Judge Vince Chhabria ruled against authors in a case involving Meta [8], stating that training AI on copyrighted content without permission does not qualify as fair use [8], although the authors did not effectively present their case [8].

Bestselling author David Baldacci shared his personal experiences with copyright infringement [6], expressing a profound sense of violation over the unauthorized use of his works and emphasizing the detrimental effects of mass piracy on authors and creative professionals. He is involved in a class action suit against OpenAI, voicing concerns about AI-generated works resembling his own and the potential for market oversaturation. Baldacci noted that online platforms are beginning to require authors to disclose if their works involve AI [1], with Amazon recently limiting the number of books authors can publish daily to address this issue [1].

Maxwell Pritt [1] [6] [8], legal counsel for plaintiffs in lawsuits against AI companies and a partner at Boies Schiller Flexner, underscored the significant reliance of these firms on unlawfully obtained copyrighted materials to gain a competitive advantage [6]. He revealed that internal documents from Meta suggested that Mark Zuckerberg approved the use of pirated material [8], with staff aware of the illegality of their actions [8]. Pritt emphasized that there is no exemption in the Copyright Act for AI companies engaging in mass piracy [1], and he noted that Meta was aware of the illegality of its actions, as evidenced by internal communications acknowledging the piracy and attempts to conceal it through non-Meta servers [2]. Hawley presented these internal communications during the hearing, which included expressions of ethical concerns from Meta employees about using pirated material.

Professor Bhamati Viswanathan from New England Law noted that the actions of AI companies are illegal [6], particularly when they involve the use of pirated works for training purposes [7]. She emphasized that the justification of needing materials for development does not excuse the act of stealing [7], especially when these companies source from already pirated content [7], thus compounding the illegality [7]. Viswanathan clarified that not all data used for AI training is pirated, as companies may take licensed material without payment [3]. She highlighted that criminal copyright liability involves willfulness and commercial advantage [3], suggesting that Meta acted willfully and for profit [3], indicating awareness of the legality of its actions [3] [4]. Furthermore, she referenced the repository ‘Anna’s Archive,’ which provided extensive datasets of stolen copyright-protected materials to AI developers [6], illustrating the challenges faced by the creative community in protecting their work against such exploitation. Websites facilitating piracy for AI companies have faced federal prosecution [6], indicating that involvement with these networks supports criminal activities [6].

In addition to these concerns, US senators have raised alarms about companies like Meta and Anthropic training AI models on copyrighted content [9], including pirated materials [3] [9]. A number of lawsuits have been filed against major AI firms [9], including Anthropic [9], OpenAI [1] [9], Meta [1] [2] [3] [4] [6] [7] [8] [9], Microsoft [9], and Google [9], for allegedly using copyrighted works without permission [9]. In 2023, a group of authors [9], including Sarah Silverman and Ta-Nehisi Coates [9], sued Meta for copyright infringement related to its Llama AI models [9], but a federal judge dismissed their claims [9], citing fair use [8] [9]. Other authors are pursuing similar legal actions against OpenAI and Anthropic [9], with a recent ruling favoring Anthropic [9].

Experts [2] [4] [7] [8] [9], such as Michael Smith from Carnegie Mellon University [9], have noted that AI companies are echoing arguments made by early internet firms [9], suggesting that strict enforcement of copyright law could hinder innovation [9]. He proposed the development of a licensing model for generative AI [9], akin to music licensing in the early 2000s [9]. The legal landscape surrounding AI training on copyrighted material remains complex [9], with US courts facing challenges in addressing these issues [9]. There is currently no consensus on fair use [9], and the US Copyright Office has indicated that not all uses of copyrighted material in AI training can be classified as fair use [9]. The Copyright Office has suggested the potential establishment of a licensing regime to ensure creators are compensated for the use of their works in AI model training [9], emphasizing the need for enforcement to protect content creators [9].

Senators Hawley and Durbin questioned Lee about whether AI companies benefit from unpaid content at the expense of creators [1], to which Lee acknowledged the concern but stressed the need for US competitiveness in AI against China [1]. Hawley challenged the fairness of corporations profiting from the work of American authors [1], while Lee defended the fair use doctrine as a means to balance innovation and copyright protection [1].

Conclusion

The debate over AI companies’ use of copyrighted content underscores the tension between technological innovation and intellectual property rights. The hearings and legal battles highlight the need for a balanced approach that protects creators while fostering innovation. The potential establishment of a licensing regime could provide a pathway to ensure fair compensation for creators, while ongoing court cases may further clarify the application of fair use in the context of AI. The outcome of these discussions and legal proceedings will have significant implications for the future of AI development and copyright law.

References

[1] https://www.publishersweekly.com/pw/by-topic/digital/copyright/article/98233-senate-hearing-debates-ai-training-on-copyrighted-works.html
[2] https://www.hawley.senate.gov/chairman-hawley-exposes-big-techs-complicity-in-piracy-to-train-ai-models-willfulness-to-bankrupt-u-s-creative-community/
[3] https://www.foxbusiness.com/politics/senator-slams-big-techs-role-pirating-copyrighted-books-ai-training-purposes
[4] https://www.advanced-television.com/2025/07/17/us-senator-exposes-big-techs-piracy-complicity/
[5] https://news.bloomberglaw.com/ip-law/ai-training-on-pirated-data-targeted-at-senate-copyright-hearing
[6] https://www.transparencycoalition.ai/news/senate-hearing-on-ai-and-copyright
[7] https://ipwatchdog.com/2025/07/20/hawley-says-congress-must-step-in-to-fix-ai-companies-mass-theft-of-copyrighted-works/id=190469/
[8] https://www.musicbusinessworldwide.com/ai-companies-accused-of-largest-domestic-piracy-of-ip-in-our-nations-history-at-congressional-hearing-led-by-maga-republican/
[9] https://www.techtarget.com/searchenterpriseai/news/366627854/AI-training-copyright-issues-headline-US-Senate-hearing