Federal Judge Recognizes the Right To Train AI on Copyrighted Works
The lawsuit is a win not just for Anthropic, but for all users of large language models.

Judge William Alsup of the U.S. District Court for the Northern District of California ruled Monday that Anthropic did not violate the Copyright Act by training its large language model (LLM) on copyrighted books so long as those books were legally acquired. This case is one of many filed against artificial intelligence (AI) companies for the way in which their LLMs use copyrighted material. In this case, Alsup ruled that AI companies have the right to train large language models on copyrighted material to produce original content, the same way people do.
Bartz v. Anthropic was filed in August 2024 on behalf of a class of fiction and nonfiction authors, alleging that Anthropic had built its business by "stealing hundreds of thousands of copyrighted books." The authors alleged that Anthropic downloaded known pirated versions of the plaintiffs' works, in violation of the Copyright Act, without compensating them. Central to the plaintiffs' complaint is the claim that Anthropic's "Claude LLMs compromise authors' ability to make a living" by allowing "anyone to generate—automatically and freely (or very cheaply)—texts that writers would otherwise be paid to create and sell."
Alsup agreed with the authors in part. The federal judge concluded in his ruling that Anthropic had freely downloaded over 7 million copies of copyrighted books from pirate sites in addition to purchasing and scanning physical books to amass "a central library 'of all the books in the world'…to train various large language models." Alsup affirmed that "there is no carveout…from the Copyright Act for AI companies," so the use of pirated materials was illegal. However, Alsup held that Anthropic's digital conversion of physical books wasn't a copyright violation because "storage and searchability are not creative properties."
Most significantly for the AI economy, Alsup rejected the plaintiffs' argument that "computers nonetheless should not be allowed to do what people do." Were this argument brought to its (il)logical conclusion, computers that merely perform arithmetic would be illegal—humans are the first arithmeticians, after all, and human calculators certainly lost their jobs after the invention of the digital calculator. (Perhaps we should also burn the ancient abacus!)
Alsup states that authors "cannot rightly exclude anyone from using their works for training and learning as such." While it's perfectly reasonable to expect people to pay for access to copyrighted material once, making everyone pay "for the use of a book each time they read it, each time they recall it from memory, [and] each time they later draw upon it when writing new things in new ways" would be "unthinkable," ruled Alsup. Likewise, it is fair use for AI companies to use legally acquired copyrighted works to train LLMs "to generate new text [that is] quintessentially transformative."
Alsup concluded that Anthropic's use of copyrighted books to train Claude "did not and will not displace demand for copies of Authors' works." Moreover, even if the LLM diminished demand for the authors' works due to "an explosion of competing works," the authors' complaint would be "no different" than "if they complained that training schoolchildren to write well [results] in an explosion of competing works." Such complaints do not concern the Copyright Act, ruled Alsup.
The Authors Guild expects Alsup's decision to be appealed and is "confident that its findings of fair use on the training and format-conversion issues will ultimately be reversed." Only time will tell if the ruling will be reversed, but if it is, the public should expect the price of AI tools to increase as authors of copyrighted material markedly hike the cost of LLM training.
Editor's Note: As of February 29, 2024, commenting privileges on reason.com posts are limited to Reason Plus subscribers. Past commenters are grandfathered in for a temporary period. Subscribe here to preserve your ability to comment. Your Reason Plus subscription also gives you an ad-free version of reason.com, along with full access to the digital edition and archives of Reason magazine. We request that comments be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of reason.com or Reason Foundation. We reserve the right to delete any comment and ban commenters for any reason at any time. Comments may only be edited within 5 minutes of posting. Report abuses.
Please
to post comments
Federal Judge Recognizes the Right To Train AI on Copyrighted Works
Am I the only one that read "Recognizes" as "Revokes" at first?
You get so used to seeing the word revoked, you just assume that’s what it was.
So IP is heavily protected...except for this one huge, massive area.
That has a lot of financial incentives.
So rhe commercialization of other people's work now doesn't involve IP laws... seems wrong.
There's no protection against someone *reading* your published work and *learning* something from doing so. Which is essentially what the authors were asking the court to find.
Imagine if you committed a copyright violation because you learned something from the book and used that knowledge later.
I have a memory which I cannot substantiate by Google, of some woman head of a copyright association in the late 1990s or early 2000s, giving a lecture about the evils of lending libraries not paying royalties. Whatever point she had in that regard was swamped by the extremity of the rest of her remarks. By now, all I remember is tons of jokes about whether parents who read to their children would have to buy a separate copy for each child, whether it was legal to sing in the shower or your car, whether it was legal to write reviews, or to quote movie lines with friends. But I can't confirm any of this with any search engine.
Cory Doctorow had a Young Adult science fiction story where everything you heard, saw, wrote, or said was monitored by some brain implant and every copyrighted word automatically deducted a micropayment from your bank account.
The problem is how these will create content or share information that might be paywalled. It's one thing to access sources and quite another to give no guardrails to how it's used.
What guardrails exist for humans who do such things?
Humans can be sued if their work is too derivative.
Odds are --- LLM will not fave similar risks.
EDIT: Mind you, I'm all on board with slashing IP law nearly to nothing.
Copyright protects literary content. It doesn't protect knowledge. It doesn't protect literary techniques.
Copyright is now meaningless.