The Volokh Conspiracy

Mostly law professors | Sometimes contrarian | Often libertarian | Always independent

AI in Court

Corpus Linguistics, LLM AIs, and the Future of Ordinary Meaning

Our draft article shows that corpus linguistics delivers where LLM AI tools fall short—in producing nuanced linguistic data instead of bare, artificial conclusions.

|

Modern textualism is built on at least three central pillars. Textualists credit the ordinary meaning of the language of law because such meaning: (1) can reasonably be discerned by determinate, transparent methods; (2) is fairly attributable to the public who is governed by it; and (3) is expected to constrain judges from crediting their own views on matters of legislative policy.

To fulfill these goals, textualist judges expected to show their work—to cite reliable evidence to support their conclusions on how legal words or phrases are commonly used by the public. Judicial intuition is a starting point. But judges who ask the parties and public to take their subjective word for it are not engaged in transparent textual analysis; cannot reliably be viewed as protecting public reliance interests; and may (at least subconsciously) be advancing their own views on legislative policy.

The Snell concurrence acknowledges these concerns (as do the academic pieces it relies on). But the tools it advances (AI LLMs) fall short of fulfilling these key premises. Corpus linguistic tools, by contrast, are up to the task.

We show how in our draft article. In Part III we investigate the empirical questions in Snell through the tools of corpus linguistics. We performed transparent searches aimed at assessing (a) how the term "landscaping" is commonly used in public language; and (b) whether people commonly use the term "landscaping" when they speak of the installation of in-ground trampolines. Our results are granular and nuanced. They stand in contrast to the conclusory assertions of AI chatbots—conclusions that gloss over legal questions about the meaning of "ordinary meaning" and make it impossible for a judge to lay claim to any sort of transparent, determinate inquiry into the ordinary meaning of the language of law.

Our first search was aimed at uncovering empirical evidence on the conceptual meaning of "landscaping"—whether and to what extent this term is limited to improvements that are botanical and aesthetic or instead encompasses non-botanical, functional improvements. We sought empirical data on that question by searching for all uses of the term "landscaping" in the Corpus of Contemporary American English. Our search yielded 2,070 "hits" or passages of text containing the term. We developed and applied a transparent framework for "coding" a random sample of 1,000 of those hits (as detailed in Part III.A of our article). And we found that "landscaping" (a) is sometimes used to encompass improvements of a non-botanical nature (in about 20% of the passages of text we coded), and (b) and is regularly used to extend to improvements for a functional purpose (in about 64% of codable texts).

Our second search considered the ordinary meaning question from the standpoint of the "referent" or application at issue in Snell. It asked whether and to what extent "landscaping" is the term that is ordinarily used to refer to installation of an in-ground trampoline. Here we searched the 14 billion word iWeb corpus for references to "in-ground trampoline," "ground-level trampoline," "sunken trampoline," and more (as detailed in Part III.B. of our article). And we found that "landscaping" is a common term—if not the most common term—used to describe the type of work involved in the installation of an in-ground trampoline. Our dataset was small. But of the texts where a category word was used that could be viewed as arguably competing with "landscaping," we found that in-ground trampoline work was described as "installation" (62%), "landscaping" (33%), and "construction" (5%).

The datapoints are a bit messy (but such is American English). And the takeaway will depend on a judge's views on jurisprudential questions lurking beneath the surface—among other things, on how common a given meaning must be to fall within "ordinary meaning."

There is no settled view on the matter. Some courts deem a meaning "ordinary" only if it is the most common meaning of the term (or perhaps a prototypical application). Others treat a meaning as "ordinary" if it is less common but still attested. (See Part I.A of this article).

A judge who views "ordinary meaning" as limited to the most common sense or prototypical application of a given term might come down against insurance coverage—noting that 80% of the uses of "landscaping" refer to botanical improvements and observing in-ground trampoline work is predominantly (63%) referred to as "installation."

A judge who views "ordinary meaning" to extend to less common (but attested) senses might come down the other way—contending that the 20% rate of "landscaping" reference to non-botanical use is still substantial, as is the 33% use of "landscaping" to refer to in-ground trampoline work (particularly where "installation" may not be considered to compete with "landscaping" since installation work can be done by landscapers).

LLM AIs provide none of this authentic nuance or granular detail. They just give bottom-line conclusions and AI gloss.

In Snell, the queries themselves didn't ask for the "datapoints" or "probabilistic[] maps" that Judge Newsom said he was looking for. They asked for bare conclusions about "the ordinary meaning of 'landscaping'" and whether "installing an in-ground trampoline" "is landscaping." The chatbots responded accordingly. They said that "landscaping" "can" be used to encompass "trees, shrubs, flowers, or grass as well as … paths, fences, water features, and other elements," and stated that installation of an in-ground trampoline "can be considered a part of landscaping" but it is "a matter of opinion" depending on "how you define the term."

These aren't datapoints. And the chatbots won't provide them if asked. In Part I.B.1 of our article we show what happens if you push back on the chatbot. It will openly concede that it "can't generate empirical evidence," explaining that this would have to be "based on new observations or experiments"—"something I can't do directly."

This is a fatal flaw.  The premises of textualism require more than a chatbot's bare conclusions about how "landscaping" "can" be used or on whether in-ground trampoline work "can" be considered to fall within it.

The chatbot's conclusions gloss over underlying questions of legal theory—on how common or prototypical a given application must be to count as "ordinary." And in the absence of underlying data, a judge who credits the "views" of the AI is not engaged in a transparent analysis of ordinary meaning, may not be protecting the reliance interests of the public, and may be (even subconsciously) giving voice to his own views on matters of public policy.

This is just one of several grounds on which existing LLM AIs fall short where corpus tools deliver. Our draft article develops others. We will present a couple more of them in upcoming blog posts. And on Friday we will outline some potential advantages of AIs and make some proposals for a future in which we could leverage AI to augment corpus linguistic methods while minimizing the risks inherent in existing LLM AIs.