Artificial Intelligence

Federal Judge Recognizes the Right To Train AI on Copyrighted Works

The lawsuit is a win not just for Anthropic, but for all users of large language models.

|

Judge William Alsup of the U.S. District Court for the Northern District of California ruled Monday that Anthropic did not violate the Copyright Act by training its large language model (LLM) on copyrighted books so long as those books were legally acquired. This case is one of many filed against artificial intelligence (AI) companies for the way in which their LLMs use copyrighted material. In this case, Alsup ruled that AI companies have the right to train large language models on copyrighted material to produce original content, the same way people do.

Bartz v. Anthropic was filed in August 2024 on behalf of a class of fiction and nonfiction authors, alleging that Anthropic had built its business by "stealing hundreds of thousands of copyrighted books." The authors alleged that Anthropic downloaded known pirated versions of the plaintiffs' works, in violation of the Copyright Act, without compensating them. Central to the plaintiffs' complaint is the claim that Anthropic's "Claude LLMs compromise authors' ability to make a living" by allowing "anyone to generate—automatically and freely (or very cheaply)—texts that writers would otherwise be paid to create and sell."

Alsup agreed with the authors in part. The federal judge concluded in his ruling that Anthropic had freely downloaded over 7 million copies of copyrighted books from pirate sites in addition to purchasing and scanning physical books to amass "a central library 'of all the books in the world'…to train various large language models." Alsup affirmed that "there is no carveout…from the Copyright Act for AI companies," so the use of pirated materials was illegal. However, Alsup held that Anthropic's digital conversion of physical books wasn't a copyright violation because "storage and searchability are not creative properties."

Most significantly for the AI economy, Alsup rejected the plaintiffs' argument that "computers nonetheless should not be allowed to do what people do." Were this argument brought to its (il)logical conclusion, computers that merely perform arithmetic would be illegal—humans are the first arithmeticians, after all, and human calculators certainly lost their jobs after the invention of the digital calculator. (Perhaps we should also burn the ancient abacus!)

Alsup states that authors "cannot rightly exclude anyone from using their works for training and learning as such." While it's perfectly reasonable to expect people to pay for access to copyrighted material once, making everyone pay "for the use of a book each time they read it, each time they recall it from memory, [and] each time they later draw upon it when writing new things in new ways" would be "unthinkable," ruled Alsup. Likewise, it is fair use for AI companies to use legally acquired copyrighted works to train LLMs "to generate new text [that is] quintessentially transformative."

Alsup concluded that Anthropic's use of copyrighted books to train Claude "did not and will not displace demand for copies of Authors' works." Moreover, even if the LLM diminished demand for the authors' works due to "an explosion of competing works," the authors' complaint would be "no different" than "if they complained that training schoolchildren to write well [results] in an explosion of competing works." Such complaints do not concern the Copyright Act, ruled Alsup.

The Authors Guild expects Alsup's decision to be appealed and is "confident that its findings of fair use on the training and format-conversion issues will ultimately be reversed." Only time will tell if the ruling will be reversed, but if it is, the public should expect the price of AI tools to increase as authors of copyrighted material markedly hike the cost of LLM training.