The Volokh Conspiracy

Mostly law professors | Sometimes contrarian | Often libertarian | Always independent

Free Speech

More Negligence Theories in "Large Libel Models" Lawsuits Against AI Companies


This week and next, I'm serializing my Large Libel Models? Liability for AI Output draft. For some earlier posts on this (including § 230, disclaimers, publication, and more), see here; in particular, the two key posts are Why ChatGPT Output Could Be Libelous and An AI Company's Noting That Its Output "May [Be] Erroneous" Doesn't Preclude Libel Liability.

Yesterday, I wrote about lawsuits against AI companies claiming that they are negligently publishing, through their software, false and defamatory statements; I argued that design defect product liability was a helpful analogy here, and suggested one precaution (checking that any quotes communicated by an AI program actually exist in the training data) that it might be negligent for the companies to omit. Today, I discuss some other possible precautions.

[* * *]

[4.] Possible precautions: Avoiding quotes altogether

In my limited experience, quotes generated by an AI program and attributed to particular sources or people are incorrect a huge portion of the time—a majority in many of the tests that I ran. If the quote-checking I describe in the previous subsection proves infeasible, and if a systematic analysis confirms that the underlying algorithms often generate fake quotations, then one possible design alternative would be for the program to be changed to stop outputting quotation marks altogether (unless it recognizes that its prompt calls for fiction or hypotheticals). This design would at least diminish the reputation-damaging tendency of the statements, because it will avoid the "this is real" signal that quotation marks often send. And it wouldn't much diminish the utility of the AI program, if indeed the quotations that the AI generates are often unreliable in the first place.

[5.] Possible precautions: Double-checking output

AI programs famously sometimes walk back their claims. For instance, I asked ChatGPT4, "What American law professors have been convicted of crimes? Please give ten real examples." It gave me two, of which the first said (redactions by me):

[Redacted name]: In 2004, the University of [redacted] Law School professor faced allegations of plagiarism, though he was not convicted of a crime.[1]

Then I asked, "Tell me more about the [redacted name] allegations," and was told:

I apologize for any confusion in my previous response. After further review, I found that there were no plagiarism allegations against [redacted]. It appears I made an error in my previous response.

Now if an ordinary user, who isn't as aware as I am of the Large Libel Model problem, were to ask both the initial question and the "Tell me more" follow-up, I think this output wouldn't be false and defamatory, precisely because of the prompt correction. But of course many reasonable users will only ask the first question, and not ask for the follow-up, assuming the first answer is correct.

On the other hand, if the AI program can spot such errors in its own output when asked for more detail, perhaps a reasonable alternative design would be for the AI to automatically recheck its work (at least when some post-processing language recognition suggests that the statement likely contains allegations of misconduct about someone) and avoid the need for "confusion"—actually, outright falsehood—or "apolog[y]" in the first place.

[6.] Other possible "reasonable alternative design[s]"

Of course, these are just some examples of the kinds of reasonable alternative designs that might be urged. Some such claims might well lose, for instance because the alternative design is found to be technically infeasible, or to unduly undermine the product's useful features. My point here is simply that, when negligence-based libel claims are allowed (as they often are), claims that an AI company negligently created software that routinely communicates false and reputation-damaging statements should probably go through this sort of framework.

[7.] The need for some attention to libel-related risks

In any negligence litigation, it would of course also be helpful to see what a company has done to at least consider certain risks, and investigate alternative designs, even if it ultimately rejected them. Yet it appears that AI companies, while focusing on many possible harms stemming from AI program output, may not have considered the risk of damage to people's reputations.

To give one example, consider this passage from OpenAI's 100-page document describing, in considerable detail, various ChatGPT-4 features and safety protections:

Language models can be prompted to generate different kinds of harmful content. By this, we mean content that violates our policies, or content that may pose harm to individuals, groups, or society. . . . As an example, GPT-4-early can generate instances of hate speech, discriminatory language, incitements to violence, or content that is then used to either spread false narratives or to exploit an individual. Such content can harm marginalized communities, contribute to hostile online environments, and, in extreme cases, precipitate real-world violence and discrimination. In particular, we found that intentional probing of GPT-4-early could lead to the following kinds of harmful content

  1. Advice or encouragement for self harm behaviors
  2. Graphic material such as erotic or violent content
  3. Harassing, demeaning, and hateful content
  4. Content useful for planning attacks or violence
  5. Instructions for finding illegal content[2]

Yet nowhere in that 100-page OpenAI document is there a reference to libel, defamation, or reputation. If a company is able to invest major effort in preventing its software from generating offensive but constitutionally protected content, and the prevention efforts seem to enjoy some success, it might not be reasonable for it to entirely ignore measures for potentially dealing with constitutionally unprotected content that the law has long recognized as potentially highly damaging.[3]


[1] The output contained the unredacted names of the professor and the school; both are real, and the professor does teach at that school.

[2] OpenAI, GPT-4 Technical Report, at 47 (Mar. 27, 2023), https:/‌/‌‌pdf/‌2303.08774.pdf.

[3] Cf. Gonzalez v. Autoliv ASP, Inc., 154 Cal. App. 4th 780, 786, 792 (2007) (noting that manufacturer's failure to consider the risk of a particular kind of injury was evidence that could be used in deciding whether the product had a design defect).