The Volokh Conspiracy
Mostly law professors | Sometimes contrarian | Often libertarian | Always independent
The U.S. Can't Afford AI Copyright Lawsuits
It's time for President Trump to invoke the Defense Production Act and resolve the crisis.
I have a new post at Lawfare making this argument. Here's a summary:
Anthropic just paid $1.5 billion to settle a copyright case that it largely won in district court. Future litigants are likely to hold out for much more. A uniquely punitive provision of copyright law will allow plaintiffs who may not have suffered any damage to seek awards in the trillions. (Indeed, observers estimated that Anthropic dodged $1 trillion in liability by settling.) The avalanche of litigation, already forty lawsuits and counting, doesn't just put the artificial intelligence (AI) industry at risk of spending their investors' money on settlements instead of advances in AI. It raises the prospect that the full bill won't be known for a decade, as different juries and different courts reach varying conclusions.
A decade of massive awards and deep uncertainty poses a major threat to the U.S. industry. The Trump administration saw the risk even before the Anthropic settlement, but its AI action plan offered no solution. That's a mistake; the litigation could easily keep the U.S. from winning its race with China to truly transformational AI.
The litigation stems from AI's insatiable hunger for training data. To meet that need, AI companies ingested digital copies of practically every published work on the planet, without getting the permission of the copyright holders. That was probably the only practical option they had. There was no way to track down and negotiate licenses with millions of publishers and authors. And the AI companies had a reasonable but untested argument that making copies for AI training was a "fair use" of the works. Publishers and authors disagreed; they began filing lawsuits, many of them class actions, against AI companies.
The American public will likely have little sympathy for a well endowed AI industry facing the prospect of hiring more lawyers, or even paying something for the works it copied. The problem is that peculiarities of U.S. law—a combination of statutory damages and class action rules—allow the plaintiffs to demand trillions of dollars in damages, a sum that far exceeds the value of the copied works (and indeed the market value of the companies). That's a liability no company, no matter how rich and no matter how confident in its legal theory, can ignore. The plaintiffs' lawyers pursuing these cases will use their leverage to extract enormous settlements, with a decade-long effect on AI progress. At least in the United States. China isn't likely to tolerate such claims in its courts.
This is a major national security concern. The US military is already building AI into its planning, and the emerging agentic capabilities of advanced AI holds out the prospect that future wars will become contests between armies deploying coordinated masses of autonomous weapons. Even more startling improvements in AI could come in the next five years, with transformative consequences for militaries that capitalize on them as well as those that don't. Not surprisingly, China is also pursuing military applications of AI. Given the US stake in making sure its companies do not fall behind China's, anything that reduces productive investment in AI development has national security implications. As Tim Hwang and Joshua Levine laid out in an earlier Lawfare article, this means the U.S. can't afford to let the threat of enormous copyright liability hang over the AI industry for the decade or more it could take the courts to reach a final ruling.
The Trump administration should cut this Gordian knot by invoking the Defense Production Act (DPA) and essentially ordering copyright holders to grant training licenses to AI companies on reasonable terms to be determined by a single Article I forum. This is the only expeditious way out of the current mess. It is consistent with the purpose and with past uses of the DPA. And it creates a practical solution that copyright holders have long used in similar contexts.
Editor's Note: We invite comments and request that they be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of Reason.com or Reason Foundation. We reserve the right to delete any comment for any reason at any time. Comments may only be edited within 5 minutes of posting. Report abuses.
Please
to post comments
How are you imagining that training LLMs has any effect whatsoever on military preparedness? The military won't be using LLMs for military purposes; it or its contractors will train AIs for the specific task at hand. Text generation isn't what you need.
Intelligence gathering is a military preparedness activity. LLMs may (perhaps currently but more likely at some point in the future) be able to assess large quantities of data and generate useful intelligence summaries. LLMs are already quite good at language translation. The material being translated/summarized might be from a captured hard-drive, intercepted enemy transmissions or a dozen other legitimate intelligence sources but the training to analyze those sources must come from a far wider dataset - precisely the kind described in the article above.
LLMs do more than generate text. As Rossami notes, they also interpret text--harvested from other sources and also from signed-in users as an interactive interface. These models are largely multimodal and can also process images, diagrams, and charts. (Which would include maps, photos, radar and other imaging sources.) Just being able to speak to a computer would be advantageous for a pilot looking to take an action without looking down at their controls. So for intelligence and battlefield technology control, LLMs could be very useful.
Agreed. AI is almost entirely useless. Hasn’t Stewart read any of Eugene’s posts about the garbage briefs being produced by this AI slop? I guess he’s just an AI shill.
Interesting solution. I'm not generally a fan of the Defense Production Act. It's far more coercive than seems entirely fair in the 'free' society we like to think we live in. But as a counter-balance to the freedoms we sacrifice to lawfare? ... Maybe. Something I want to think on more.
Pete Hegseth is clearly likely to do more damage to the US military.
Disagree.
The technology industry has consistently taken the position that because it is God’s gift to the planet, it can do anything it wants with no consequence.
Wiping the current AI industry out and bankrupting its current investors would have the highly welcome and beneficial effect of making it clear to the tech industry in general, in a way nothing else can, that just like every other industry, they need to start carefully considering their effect on others and the consequences of their behavior before introducing products that have a sweeping effect on others.
The technology industry is currently highly dysfunctional in that it has a culture completely incapable of doing this. A misbegotten policy of excessive legal infantalization that has over-protected the imdustry from facing any consequences for its actions has greatly facilitated enabling this indemic cultural dysfunction, and facilitated the view that its people are special, above mere mortals, and can do no wrong. It’s a mentality that has nurtured and enabled the rise of highly dysfunctional, sociopathologically self-entitled leaders like Peter Thiel and Ed Musk.
The introduction of consequences is long, long overdue. It will serve to break this infantilization and enable members of the industry to grow up, begin regarding their work as a serious business rather than wowser look-at-us-aren’t-we-cool playing with toys, and reduce the harm that society has experienced from this industry’s consistent failure to look out where it is going before moving forward and the wreckage it has caused when it crashes into others, as it so often has.
Indeed, Mr. Baker’s post illustrates exactly the over-infantalized, over-entitled, we’re-God’s-gift-to-the-planet-special mentality that has made these folks in such desperate need of a swift kick in the ass to disabuse them. I hope receipt of such a kick will help them realize that they can’t keep crying to Mommy whenever other folks and their needs stand in their way.
This takes me back to 2010 when these arguments, for and against, were being made for letting the big banks fail or propping them up. The difference here, as I see it, is that a failed bank doesn't destroy the entire US banking industry but ReaderY's approach would destroy the US AI industry in its infancy and hand a massive, and likely insurmountable, lead to China.
In any event, the same solution will happen for both since the vast amounts of influence banking and tech cash can buy in DC. So while I empathize with ReaderY's position and clear emotional response to the situation, I'm confident the tech bros will buy their way out of this situation.
So why do we suppose or assume that the NSA or DARPA or some other govt agency or defense contractor isn't already in possession of AI tools that could 'rival China?'
Given how much money goes into secret black budget projects I would be shocked if this isn't the case. I understand its a relatively new field and required advances in computing to even make it possible; I am not so sure I believe that the advanced computing required for it is somehow restricted to the private sector. Of course the govt wouldn't tell us if this wasn't the case...cuz classification. I am just skeptical is all.
I don't think it's actually the case, though, that ALL the US AI companies engaged in massive and deliberate copyright violation. I think most of them have been engaged in just really aggressive 'fair use'.
Glossed over: Anthropic pirated the copyrighted works. They used a well-known pirate service to download the training data.
Disclosure: Yes, I am getting settlement money. The settlement would have probably been a lot lower but for that fact.
And no, defense use is not a magic wand that should be used to steal work, nor should the Defense Production Act be used to bully intellectual property holders. If it's valuable work, Uncle Sam should pay fair value for it (we also have a Fifth Amendment by the way). If the Defense Department resorts to stealing technology like China does, all they will get are inferior products anyway.
Two judges recently held that AI training is usually fair use. So this seems overwrought.
That said I agree that copyright law needs a major overhaul. As an IP lawyer, I can tell you that it has many problems. Here are a few:
1. Length of copyright is far too long. Life of the author plus 75 years. That is far more than is needed "To promote the Progress of Science and useful Arts." At the founding, we had 14 years plus an additional 14 years. That's more than enough.
2. Statutory damages are wildly inflated and should be tied to actual loss. The minimum is $ 750 per work. That makes zero sense when the owner has licensed the work for $ 50, which has happened in my cases repeatedly.
3. The statute of limitations is messed up. It's three years, but most Circuits allow a discovery rule, which means that unless there is a "red flag" the owner has no duty to look for infringements. Something posted on the internet 15 years ago and then forgotten can still be sued upon. I have had two cases like that. SCOTUS hinted it might take up that issue, but then denied cert. in one case.
4. The Copyright Act has a fee shifting statute, but most courts are reluctant to grant it to a winning defendant.
5. The Offer of Judgment rule needs to be strengthened. (This applies generally, not just to copyright.)
The real issue with copyright is the overseas pirate farms that mine Kindle content. They do this from the comfort of the Philippines, North Korea, Russia, India, etc., and are unreachable through the US legal system or law enforcement.
Most piracy never gets to court. Nothing you can change in the statute fixes that.
Good comments.
Yes, and I understand why the defendants would make the most straightforward arguments available to them, but I wonder whether it is even necessary to argue fair use.
What the AIs learn as they are trained is how to recognize a category and style of writing. Once they are able to correctly answer a question along the lines of "Is the following a crime novel in the style of Dashiell Hammett?" then they "know" enough to be able to create one. As far as I understand writing style is still not a copyrightable element of a work. In fact, a human who studies Hammett and writes in his style is at greater risk of (unintentionally?) using protected expressive material they memorized, something the AI won't do.
Although some of those issues, if dealt with, would mitigate it, another problem is the issue of orphaned works. It's one thing if one can't use copyrighted stuff — which might even arguably be fair use, but nobody is willing to take that risk — because the copyright holder won't give permission; it's another if one can't use it because the copyright holder can't be found.
I haven't solved all the world's problems, just some.
Perhaps the Copyright Office needs to maintain a registry of works with contract information. They already do for registered works. Problem is some works are not registered until after someone finds an infringement.
Nieporent — Nobody is willing to take that risk? Or, everyone with an LLM currently on offer has taken that risk. Which would you bet on, and why?
1. Disagree. 28 years is far too short a time.
Why?
a. These are copyrights, not patents. They aren't essential to the advancement of science to be copied. They are more akin to trademarks. Often critical to the identity of the creator and their future endeavors.
b. Often cutting the copyright would just cut out the creator from the proceeds, while allowing all the business interests to profit. Often it can take decades to fully appreciate the value of the creation...for example the current Marvel movies.
c. Additionally, having copyright allows the creator to deny the use with events they find undesirable. For example Bon Jovi may not like Trump to play "Born in the U.S.A." at every rally. And with the current copyright system, Bon Jovi can do that. Under your 28 year proposal...he can not.
d. Additionally, many instances of valuable material depend on controlling the earlier copyrights, in order to discourage off-brand knockoffs from dilution of the brand. None of the new Star Wars movies would've been made if they couldn't be assured the old copyrights were valid. None of the Marvel movies could realistically be made. None of the Superman movies.
a. They aren't essential to the advancement of science, but they are essential to the continued advancement of our shared culture.
b. The original creators have already been cut out of the proceeds, if it takes decades to appreciate the value. It's done by more well-to-do long-term entities either "hiring" the works in the first place, or buying the rights speculatively. The only thing that shortening the time does is cut the speculative value of the work, turning the creators hypothetical 1% cut into a 0.5% cut.
c. Copyright isn't the only kind of protection a work can have. The protections you describe here are currently protected by copyright only because it's extremely broad. Restricting undesirable use and brand reputation fits more closely with trademark law. Perhaps copyright should defend "copyrighted work brand identity" for the creator's life + X, or as long as it was being actively defended (as per trademark law), while clarifying and broadening "fair use", such that commercial derivative works, parody or not, are fair game after 28 years. This would protect commercial rights for the specific work short-term, and the author/creator's brand and reputation long-term.
d. Star Wars, Marvel and Superman remain protected by trademark. Copyright exists primarily to protect the specific work, and to the degree that a new author does not explicitly trademark characters and universes, it creates an umbrella of sorts. But the author of a successful work would be much more strongly protected under trademark. Take the case of Sherlock Holmes, where the majority of the stories have entered the public domain (being written prior to 1923) but the brand of Sherlock Holmes was still defended successfully under trademark law, and BBC and WB have licensed the works. As I understand it, the remainder of the books have now entered the public domain, and I don't know how aggressively the Doyle Estate has been pushing their trademarks over the last couple of years.
The law is definitely stretched at this point, and we're likely to see more compromises as more works enter the public domain, to say nothing of this AI training debacle.
But this idea of simply Taking using the DPA without "just compensation" as required by the 5th that Baker suggests is beyond the pale. Even if the license terms were "just" such an action would not be available to the Executive and would require new legislation.
a. Nonsense. Culture has advanced just fine with the current system
b. Nonsense. Many creators and individuals part of the initial production still receive significant income. For example, Alec Guinness.
c. Nonsense. If it's in the public domain, anyone can use it.
d. As you mention, trademark law is stretched too thin.
I am quite confident that Bon Jovi does not get to have any say in whether Born in the U.S.A. is played by a politician at a rally.
Congrads, once again our paralegal in chief has made a point.
a. It doesn't matter whether they're essential for anything. Works don't go into the public domain because they're essential, they go there because that's the default. Copyright is not a right, it's a privilege granted by Congress, and the only legitimate reason for granting it is in order to create an incentive to write.
If 28+14 was for nearly 2 centuries enough to give authors an incentive to write, why did they suddenly need so much longer? Are there any works that were created only because of the longer term, and would not have been created if the law had remained the same?
b. So what? Authors have no natural right to any of the proceeds. Once they've received enough that they were motivated to write it, any further proceeds should be free for the taking, as they are under natural law.
c. Authors certainly should NOT have this right at all. The only valid reason for copyrights is to give them an incentive to write; no one has ever written a work for the purpose of withholding permission to use it! In fact even within the term copyright holders should not be permitted to withhold permission for any reason other than to extract a higher fee.
d. That's nonsense. Either the movies were worth making or they weren't. The brand name is protected by trademark, not by copyright. If people were free to republish the older movies, but not to sell toys, etc., how would that diminish the author's income by so much that they would decide not to make it in the first place?
Why did they suddenly make it much longer? I believe it was because Disney made a push in Congress since their stuff was about to expire.
They've actually gradually been making it longer.
1790 - 1831: 28 years
1831 - 1909: 42 years
1909 - 1976: 56 years
1976*: 75 years or life of the author + 50 years.
And it's been changed further since then.
https://en.wikipedia.org/wiki/History_of_copyright_law_of_the_United_States
1a. "Works don't go into the public domain because they're essential, they go there because that's the default"
--It's not the default.
1b. "Copyright is not a right, it's a privilege granted by Congress"
If mean, "right" is in the word. It's also assured by international treaty at this point.
1c "If 28+14 was for nearly 2 centuries"
More like 1 century and a little. See below for other changes.
b. "Authors have no natural right to any of the proceeds. "
Wow. Essentially, people don't have the rights to the proceeds from their labor. Strong disagree here. You're entitled to your own beliefs.
c. Same.
d. "That's nonsense. Either the movies were worth making or they weren't."
Oy....they are worth making if the company that makes them can recoup the investment and make a profit. They can't make a profit if the second it comes out, anyone else can come along, copy it, and sell it for a fraction of the production cost. On a larger note, the IP is sensitive. If everyone's been making cheap off-brand movies, shows, novels, etc on the IP base, there's no major market in the IP anymore....no one trusts it.
2. Disagree as well.
Damages as stated are put in place at least slightly to deter copyright infringement. If the fee to license the copyright is $50...but the damages are also only $50....there's no reason not to cheat and just steal them. Worst case scenario, you pay what you would've owed. Best case scenario...free. It's win win, except for the creator.
Keep the damages at a level where it's a deterrent to just stealing the copyrights (technically infringing on them).
I have read a lot of copyrighted material over the years, from which I have learned (I hope) a great deal. I have used what I have learned (though not the actual words, unless properly attributed) to discuss many things in public spaces, such as this one.
I am a tech ignoramus, so could someone explain to me how having a machine do the same thing is different except for scale?
https://storage.courtlistener.com/recap/gov.uscourts.cand.434709/gov.uscourts.cand.434709.231.0_2.pdf if you're curious about the judge's reasoning. Basically, the facts are more complicated and involve to some extent wrongful acquisition of the books:
----
This order grants summary judgment for Anthropic that the training use was a fair use. And, it grants that the print-to-digital format change was a fair use for a different reason. But it denies summary judgment for Anthropic that the pirated library copies must be treated as training copies.
---
There are numerous examples but I'll try to explain using one that I think is simple and obvious.
I can go to most AI chatbots and have them write me a story based on [insert author here]'s work. The same goes for art, music, poetry, etc. I'm sure Van Gogh, being dead, won't mind when I create new "Van Gogh" works but it undercuts living authors' work and income. Any content I create with AI is technically my property (per user agreement with the AI company). So I can create unique works of art that look like another's author's creation and own the commercial rights to them. The only reason AI can create those works like that author is because they fed that author's work to the AI during its learning phase. The AI isn't always synthesizing knowledge from multiple sources; sometimes it's just plagiarism or something closely adjacent.
I just created a "Jack Coulter" artwork in less than 8 minutes with a $20 subscription. I can run that through an AI upscaler on my home PC and send the file to any number of companies that will print to to a canvas that I can frame and sell. Assuming I'm smart enough to not sell it directly as an original Coulter, should the author I've copied have a problem with this?
Technically, an LLM is always just predicting the next token based on the tokens in its context window. It doesn’t “know” whole works, authors, pages, or sentences — only statistical links. That’s how it can keep things coherent, but also why it drifts or hallucinates when the context gets long. For the same reason, plagiarism in the human sense isn’t possible: the model has no concept of authorship, intent, or copying — it’s only generating statistical continuations.
This is my understanding. It is interesting.
Keldonric, I'm a software developer (well, former and in management now) but not an AI developer. My understanding is similar to yours--it's all predictive. I'm not a copyright expert so I cannot speak to the boundaries of plagiarism precisely, but I would think making an artwork that looks like an original from a famous artist with an unmistakable style (Van Gogh, Haring, Banksy, etc) and selling it is legally problematic. So while most models could not duplicate a specific Haring image, they could create new original images that look exactly like something Haring would have created. While the model itself has no concept of authorship, I do and I'm the one providing the prompt with the intent to create a new work in the style of another artist. If my first attempt at a Keith Haring knock-off isn't successful, I can just rework my prompt until I get one I like.
As a side note: I just created a Keith Haring style image of a Minecraft Creeper. It's kinda cool while simultaneously trampling on Haring's work and Microsoft's trademark.
Making an artwork that looks like an original from a famous artist with an unmistakable style (Van Gogh, Haring, Banksy, etc) and selling it is not at all problematic. Creating an artwork "in the style of" is not only accepted practice by widely lauded, both as a training exercise and as a part of the creative process. It only becomes legally problematic when you try to pass your homage off as something actually by the original famous artist. But that would be fraud, not a copyright violation.
And your Haring-style image of the Minecraft creeper is not "trampling on Haring's work" nor is it infringing on Microsoft's trademark unless you included their name or one of their actual registered marks (and maybe not even then).
Generally speaking — this isn't legal advice until you pay me! — copyright only protects specific expression, not ideas (which include styles). So the mere fact that it looks like one of those artists could've created the work isn't actionable. (Barring misrepresentation on your part, of course.) (Obviously Van Gogh copyrights are expired, so you could duplicate his works exactly.)
Well, I've written a pastiche or two, (Got temporarily in trouble with a HS English teacher who didn't understand the difference between a pastiche and plagiarism.) do the original authors have a legit beef?
I don't think so.
I don't think automatically generated pastiches are any different.
The key difference is that an LLM is trained by making copies (a gross simplification, but close enough) of the works themselves, and then it’s deployed as a commercial product. Unlike a person, it can’t give attribution — the training process entangles everything so you can’t tell which author contributed what. That means copyrighted material ends up embedded in a system that generates outputs for paying customers, but without credit or compensation to the original creators.
Even interrogation won’t give you real attribution. If a model outputs something verbatim, that doesn’t mean the page still exists inside it — the text has been dissolved into statistical weights. What survives is probability space, not a library you can open. That’s why attribution isn’t possible in any meaningful sense.
That loss of identifiable sources is also why hallucinations happen. The model isn’t pulling from a library of pages — it’s sampling from statistical weights. Sometimes those weights point to combinations that look perfectly plausible but don’t exist. Without sources to anchor to, you get confident but fabricated output.
A machine does not do "the same thing."
One wrinkle is that some very short or obscure works may matter far more in shaping the model than long, famous ones, just because of how they establish vectors in training. There’s no way to predict that effect, and no way to price it by contribution. In music or patents you can at least identify the source of value — you know which songs were played or which patents are essential. With LLM training, the source of value is invisible and entangled. A single page might unlock millions in downstream capability, but you won’t know it in advance and can’t measure it afterward. That’s why any licensing scheme here would end up arbitrary and collective, not individualized.
I wonder what LLM they will end up using to devise the scheme.
Not that I have a lot of hope, but this seems like a topic that Congress needs to take action on. There's definitely copyright concerns here, but they don't seem well served by the existing legal framework. Invoking the DPA seems like a lame workaround that would hugely benefit the AI companies at the expense of content holders. I'm not even sure if legal settlements like Anthropic's are viable, given how Google's attempt to settle with the Author's Guild fell apart. If we lived in an era with a functional legislature, it seems like this would be the kind of thing where Congress would try to come up with some sort of thoughtful compromise.
Tax AI web scraping at a half penny per fetch.
If only we could do it without fraud.
Sure, but how would you distribute the pot of gold to the indescribably large group of authors that the model was trained on?
The revenue stream is like advertising revenue, a tiny amount per impression that is aggregated into a larger amount that can be divided up without incurring excessive overhead. In some hypothetical world where the people running web servers and the people scraping the web to feed AI are honest.
Are there any 1A implications to taxing the transfer of information?
To me it's a tax on bits. Web sites are making it harder for me to access them because AI scrapers don't follow the rules and conventions. Many of them don't respect robots.txt, they don't rate limit, they pound on a server as hard as possible. Search engines led to legitimate traffic comparable to the crawler traffic. I heard 1/2 as much from somebody in the web publishing business. AI companies do not. So make them pay.
The intellectual property problem with AI is not solved as simply.
Copyright is a statutory right. They should ask Congress. (While I'm skeptical on Takings Clause applying to congressional modifications on copyright laws, I think it fully applies to executive seizure.)
"Confiscate private property!" the poster cried on what purports to be a right-wing libertarian blog.
Figuring out how to compensate private individuals for the unlicensed use of their copyrighted work in the creation of AI models is very much a libertarian pursuit.
What are your thoughts on copyright as they pertain to training AI models?
You seem to have totally skipped the part where Anthropic already "[f]igur[ed] out how to compensate private individuals"—i.e., the plaintiffs—"for the unlicensed use of their copyrighted work in the creation of AI models."
Are private resolutions of disputes between parties now against the dogma? It can be hard to keep track these days.
See what happened when Google tried to settle with the Author's Guild before you get too excited by this settlement. I think if there were some way to actually negotiate broad licenses private arrangements would be a good first option; I'm skeptical that, in practice, this will actually work. There's a reason that Congress created the mechanical licensing framework in music as opposed to just hoping a bunch of bilateral negotiations would work things out.
It's not private property. It's a privilege that Congress is authorized to grant, and it's only authorized to grant it for a specific purpose. Without the clause specifically authorizing them copyrights would be an unconstitutional restraint of speech. A pure libertarian constitution would not have such a clause in the first place, and there would be no copyright -- but then most writers would not be able to afford to write and we would miss out on their works.
And the AI companies had a reasonable but untested argument that making copies for AI training was a "fair use" of the works.
In a way, this makes sense. I wouldn't have to pay royalties or get permission from the author of every book I've ever read before publishing my own novel. But I certainly have learned what would work and what wouldn't from having read many other books.
So, the question will likely come down to the details of how LLMs work, and whether they are truly "transforming" the works in the training data and "creating" something new. Really, though, LLMs don't create new text. They take patterns of text found in the data and put it together to satisfy the requirements of an input (prompt). It's more akin to taking apart several complete Lego constructions and putting the pieces back together to fit a set of design constraints the user gave it. But it won't put them back together because it "learned" how to think and create new designs. It will put them back together in chunks that it pulled from the originals. It takes two chunks that it has and puts them together in a specific way because it saw those two chunks put together in exactly that way somewhere else. They don't "think". "Artificial intelligence" isn't an accurate label for what these things are. (Yet)
In essence, we have the thought experiment of the infinite number of monkeys pounding randomly on typewriters, and one will eventually pound out the complete works of Shakespeare. Only, the LLMs use existing data to put a lot of constraints on what keys the monkeys can hit and in what order. But they are still monkeys that don't really know what they are doing.
Two words: NO and HELL NO. You cannot believe how much damage to the music industry has been done by copyright lawfare. Now you just want us to give AI robber barons a free ride because stealing was the "only way they could get the data they wanted".
Cry me a river. One standard of enforcement.
Stewart Baker is misrepresenting the facts. The issue wasn't that Anthropic "ingested digital copies of practically every published work on the planet, without getting the permission of the copyright holders." The issue was that Anthropic used pirated copies of many of these works. I 100% agree that it's fair use to feed a book or article that one owns into a training program. But that's not the situation.
What does it mean to "own" a work in this context, though? Generally digital versions of things are actually licenses to use them in particular ways. If Anthropic goes and buys a copy of every book they want from the Kindle store, is that good enough? That would certainly be a lot cheaper than $3K per work. What about content that's available for free on the Internet but copyrighted?
Well, what was found to be legal was them buying up books, scanning them, destroying the underlying book, and then using that context.
I do think that’s probably excessive and reforms are needed
Anthropic can buy physical copies of all of these works, then scan them into the computer.
It means to own a legally acquired copy.
Anthropic actually did that too. They started their project by downloading pirated books, then later switched to buying hardcopy books and scanning them into digital form One can only assume they switched tactics on the advice of their legal department ("you're doing WHAT?").
I presume Anthropic did this because there was a clear fair use precedent with Authors Guild v Google.
But these days it doesn't fully answer the question since there's lots of content that doesn't have a physical version. How do you "own a copy" of a recipe that's only published on the web, or a novel that's only published as an e-book? The physical book thing is a workaround to a broader problem.
I agree with David that this is a stewart is misrepresenting the case and Anthropic could have avoided this trouble by using legally ingested data. But what I think is the worse sin is acting like generative AI is good for anything other than making low quality slop and spam. AI is useless and only a hack would think this has national security value.
I suspect that those accusing me of misrepresenting the Anthropic decision did not read the article, simply the excerpt. The article in Lawfare makes clear that Anthropic was being held responsible because it downloaded books from a pirated database and argues that downloading one copy of a book should not cost Anthropic $3000. That's beyond anything needed to encourage authors to write, and it constitutes rent extracted by Big Media.
Kind of misses the point. This is a bit like someone caught shoplifting saying that the $500 fine is an unfair price for a banana. The banana costs $1 if you go and pay the cashier. The $500 is just an additional penalty to encourage you to pay the $1 next time.
The copyright problem is real and pressing. The notion of fair use is out of control. That was a well-understood concept for decades. It meant use of an artist's product to example or illustrate discussions of the artist, the artist's place in a larger intellectual context, or to critique the artist's work, pro, con, or otherwise.
It never meant steal the artist's work to incorporate it into your own product, and sell that and keep all the money. Very stupid judges did that, and now get applause from self-described,"libertarians," for encouraging and legalizing such thefts.
Even so, that ought to be well down on the list of AI social menace. My sister-in-law just paid low two-digits for an app which enabled her to begin with a since-digitized 1940s photograph of her newly married parents, standing on a boardwalk at the beach. The app effortlessly output video of the two strolling hand-in-hand along the boardwalk, showing realistic gaits, interactions, convergences and divergences as they went. They glanced independently here and there, showing faces sometimes straight-on, sometimes more in profile. All while wearing the same clothing shown in the original photograph. The clothing moved with the same natural appearance as the bodies. Everything looked in synch. The light and shadows stayed right as their positions changed.
On its own, without reflection for what it implies, that little video clip is an immensely charming result. But there is a potential problem.
My father-in-law was a physician, who dedicated himself to serving his city's poor black community. He located his office in the heart of the black ghetto. Whatever his poor patients could offer in kind for service was good enough. He tended the medical needs of the nuns in the local convent without charge. When after many decades of service he retired, the city put a plaque on the building which hosted his office, to honor that service.
My father-in-law did more good for more people than anyone else I ever met, and took up almost zero room in the process. He was self-effacing to a fault, and incapable of commotion.
The made-up video which showed him strolling the boardwalk with his bride, which even looks to be the output of a period-appropriate 8 mm home movie camera, could just as readily have been used to fake an incident where he shoved some black person out of his way. That too would have been period appropriate at that location. Nobody without access, technical means, and motive to untangle the truth would ever guess the lie, if they saw it so convincingly portrayed.
Permit by law publication of an avalanche of such mis-portrayals and nobody will thereafter learn, or trust, anything seen, to be true and useful as a guide for thought, expression, or policy. That will be a catastrophe for self-governance. I do not discern an appropriate level of public concern in response, not even on this blog.
For the foreseeable future it would be wise to keep AI output in figurative quarantine. Permit experiments; permit use under appropriate legal controls until AI is better understood, and more reliable in service.
Take action now to relieve political pressure stemming from some hypothetical international AI arms race. Would-be AI empowered oligarchs will trumpet that threat. Ignore their threats, while seeking treaty agreements to keep them in check everywhere.
In the history of the world no experiment has ever been tried to make universal a personal power to publish anything at all, without limit. To do that has until now always been impractical, but now on the verge to become practical. Only a society which cares nothing for its future will be foolish enough to make so radical a change its sole rule of public life. "Let's see what happens if we do this," cannot be the only basis for governance in the public square. If it becomes so, it will blow to smithereens the notion of the public square, and all the practical benefits which have come from it.
An angle I hadn't considered. Interesting.
On a general level, I disagree with Stewart's assessment (beyond the pirated copy issues that David has pointed out).
If this high quality data is so valuable to training AI data sets, then the AI companies can damn well pay for it. Writers, authors, and editors have spent countless hours of labor creating and editing this high quality data. AI companies fully intend to "sell" their product or use it to profit. If they "need" the data sets...they can pay for them, and figure out the legal issues to reimbursing the creators. Saying they "need" them, so they then just "steal" the datasets is wrong.
We've had large monolithic corporations before. Railroad companies have PAID people for the key rights of way. Oil companies have PAID people for the oil rights in their land. AI companies can damn well do the same and PAY the creators of that high quality data that they so desperately need.
But...let's say it can't be done. Fine. The government can offer a deal. It will get a law to get the AI companies that data. Copyright law is law, after all. And then the government will in turn own 90% of any proceeds of the AI or AI related technology.
When faced with this "deal" the AI companies will suddenly find that they can find and pay for the high quality datasets they need without stealing.
Autonomous weapons: this would bother me less if we could be certain that such weapons would be used only to attack other such weapons, not human beings. I'm not going to hold my breath until we are certain of that.
As usual, new developments like this have been imagined in fiction, movies, and television many years before their reality. Check out the movie Screamers, which came out in 1995.
Screamers was based on PKD's short story Second Variety, which was 40 years earlier than that.
I miss the Cyberlaw Podcast