The Volokh Conspiracy
Mostly law professors | Sometimes contrarian | Often libertarian | Always independent
Corpus Linguistics, LLM AIs, and the Future of Ordinary Meaning
Our draft article shows that corpus linguistics delivers where LLM AI tools fall short—in producing nuanced linguistic data instead of bare, artificial conclusions.
Modern textualism is built on at least three central pillars. Textualists credit the ordinary meaning of the language of law because such meaning: (1) can reasonably be discerned by determinate, transparent methods; (2) is fairly attributable to the public who is governed by it; and (3) is expected to constrain judges from crediting their own views on matters of legislative policy.
To fulfill these goals, textualist judges expected to show their work—to cite reliable evidence to support their conclusions on how legal words or phrases are commonly used by the public. Judicial intuition is a starting point. But judges who ask the parties and public to take their subjective word for it are not engaged in transparent textual analysis; cannot reliably be viewed as protecting public reliance interests; and may (at least subconsciously) be advancing their own views on legislative policy.
The Snell concurrence acknowledges these concerns (as do the academic pieces it relies on). But the tools it advances (AI LLMs) fall short of fulfilling these key premises. Corpus linguistic tools, by contrast, are up to the task.
We show how in our draft article. In Part III we investigate the empirical questions in Snell through the tools of corpus linguistics. We performed transparent searches aimed at assessing (a) how the term "landscaping" is commonly used in public language; and (b) whether people commonly use the term "landscaping" when they speak of the installation of in-ground trampolines. Our results are granular and nuanced. They stand in contrast to the conclusory assertions of AI chatbots—conclusions that gloss over legal questions about the meaning of "ordinary meaning" and make it impossible for a judge to lay claim to any sort of transparent, determinate inquiry into the ordinary meaning of the language of law.
Our first search was aimed at uncovering empirical evidence on the conceptual meaning of "landscaping"—whether and to what extent this term is limited to improvements that are botanical and aesthetic or instead encompasses non-botanical, functional improvements. We sought empirical data on that question by searching for all uses of the term "landscaping" in the Corpus of Contemporary American English. Our search yielded 2,070 "hits" or passages of text containing the term. We developed and applied a transparent framework for "coding" a random sample of 1,000 of those hits (as detailed in Part III.A of our article). And we found that "landscaping" (a) is sometimes used to encompass improvements of a non-botanical nature (in about 20% of the passages of text we coded), and (b) and is regularly used to extend to improvements for a functional purpose (in about 64% of codable texts).
Our second search considered the ordinary meaning question from the standpoint of the "referent" or application at issue in Snell. It asked whether and to what extent "landscaping" is the term that is ordinarily used to refer to installation of an in-ground trampoline. Here we searched the 14 billion word iWeb corpus for references to "in-ground trampoline," "ground-level trampoline," "sunken trampoline," and more (as detailed in Part III.B. of our article). And we found that "landscaping" is a common term—if not the most common term—used to describe the type of work involved in the installation of an in-ground trampoline. Our dataset was small. But of the texts where a category word was used that could be viewed as arguably competing with "landscaping," we found that in-ground trampoline work was described as "installation" (62%), "landscaping" (33%), and "construction" (5%).
The datapoints are a bit messy (but such is American English). And the takeaway will depend on a judge's views on jurisprudential questions lurking beneath the surface—among other things, on how common a given meaning must be to fall within "ordinary meaning."
There is no settled view on the matter. Some courts deem a meaning "ordinary" only if it is the most common meaning of the term (or perhaps a prototypical application). Others treat a meaning as "ordinary" if it is less common but still attested. (See Part I.A of this article).
A judge who views "ordinary meaning" as limited to the most common sense or prototypical application of a given term might come down against insurance coverage—noting that 80% of the uses of "landscaping" refer to botanical improvements and observing in-ground trampoline work is predominantly (63%) referred to as "installation."
A judge who views "ordinary meaning" to extend to less common (but attested) senses might come down the other way—contending that the 20% rate of "landscaping" reference to non-botanical use is still substantial, as is the 33% use of "landscaping" to refer to in-ground trampoline work (particularly where "installation" may not be considered to compete with "landscaping" since installation work can be done by landscapers).
LLM AIs provide none of this authentic nuance or granular detail. They just give bottom-line conclusions and AI gloss.
In Snell, the queries themselves didn't ask for the "datapoints" or "probabilistic[] maps" that Judge Newsom said he was looking for. They asked for bare conclusions about "the ordinary meaning of 'landscaping'" and whether "installing an in-ground trampoline" "is landscaping." The chatbots responded accordingly. They said that "landscaping" "can" be used to encompass "trees, shrubs, flowers, or grass as well as … paths, fences, water features, and other elements," and stated that installation of an in-ground trampoline "can be considered a part of landscaping" but it is "a matter of opinion" depending on "how you define the term."
These aren't datapoints. And the chatbots won't provide them if asked. In Part I.B.1 of our article we show what happens if you push back on the chatbot. It will openly concede that it "can't generate empirical evidence," explaining that this would have to be "based on new observations or experiments"—"something I can't do directly."
This is a fatal flaw. The premises of textualism require more than a chatbot's bare conclusions about how "landscaping" "can" be used or on whether in-ground trampoline work "can" be considered to fall within it.
The chatbot's conclusions gloss over underlying questions of legal theory—on how common or prototypical a given application must be to count as "ordinary." And in the absence of underlying data, a judge who credits the "views" of the AI is not engaged in a transparent analysis of ordinary meaning, may not be protecting the reliance interests of the public, and may be (even subconsciously) giving voice to his own views on matters of public policy.
This is just one of several grounds on which existing LLM AIs fall short where corpus tools deliver. Our draft article develops others. We will present a couple more of them in upcoming blog posts. And on Friday we will outline some potential advantages of AIs and make some proposals for a future in which we could leverage AI to augment corpus linguistic methods while minimizing the risks inherent in existing LLM AIs.
To get the Volokh Conspiracy Daily e-mail, please sign up here.
Editor's Note: We invite comments and request that they be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of Reason.com or Reason Foundation. We reserve the right to delete any comment for any reason at any time. Comments may only be edited within 5 minutes of posting. Report abuses.
Please
to post comments
The one question I have about corpus linguistics is how do you know the universe you had to analyze, was actually a fair representation of the use of a word.
For instance, back in 1789, the word infringed might have had a different meaning than we commonly think of today. But in doing the comparison, how do you know the universe of material you are searching through from that 1789 period is actually a fair representation of what infringed actually meant to people at that time.
If you are running corpus linguistics analysis on only 5% of the material written from that time, is that 5% universe truly representative?
I am not so sure, but would love to understand that aspect better.
One related problem (among many) with 1789. The entire context of everyday life then was lived outside the scope of experience that a modern analyst can access, absent years of reading documents from that period. Given that, a great deal of the material necessary to understand the context of any term or word used in particular historical context will not include, even obliquely, the word or term a modern analyst will choose as an input for a corpus linguistics analysis.
For instance, life in 1789 involved no use of electricity. What effects did absence of electricity on administration of justice have then, which would not typically occur to a present-day analyst, if he thinks he crafted an adequate contextual search, but did not turn up with his chosen inputs anything to do with electricity? He would miss a prompt that his modern experience relies on use of electricity. He would thus fail to note how that difference might inflect a comparison between present-minded context, and the original context of creation of the text in question.
Would it be challenging for someone today to write comprehensively on how judicial practice changed after 1789, when at various times, and in various places, electrically-related devices, customs, and services were one-by-one introduced into the activities of judicial practice, over almost 2 centuries, since for instance the invention of the telegraph? Consider as a topic: “Habeas corpus review in the absence of electrical service.”
It would be nearly impossible to do that review. Indeed, the most practical method anyone could think to try would be to explore the records of judicial activity during various eras, and in various places, prior to the use of electricity, and then after years of such study make a comparison on that basis to modern practices. That describes the practice of academic history, not the method of corpus linguistics.
But note, if corpus linguistics were tried, none of the 1789 records relevant to the desired comparison—effects of electrical use on judicial administration—could be found. To enter any term related to, “electricity,” into a textual database search of texts from that era would turn up only a few references to electrical phenomena as curiosities, and deliver zero insight into the question. But that would in no way be proof that compared to tacit contextual expectations of a modern analyst, lack of electrical use in 1789 made no difference. It certainly did make a difference. Problem is, absent academic historical study, we have no way to know what difference, and corpus linguistics will prove powerless to enlighten us.
I love any term with “Corpus” in it. Think it goes back to the Law Books on the closing credits of Perry Mason (“Corpus Juris Secundum”) the Outer Limits Episode “Corpus Earthling” and of course the “Corpus Delicti” and in Med School you had your Corpus Callosum, (Did you know there are people born without one? they do just fine) Corpus Cavernosum (Did you know there are people born without one? they don’t do just fine) the Corpus Luteum (It’s sort of like the Raincoat your mom the Ovary tells you to take with you when you leave, because it’s cold and wet in the Fallopian Tubes, but you don’t listen and leave it behind, it’s even Yellow, like a Raincoat. that was an actual Med School Lecture Explanation I just gave you)
also love some of the Geographical Corpi, Corpus Christi (Bring your Corpus Cavernosum)
Frank
People use words in face-to-face transactions in ways that differ from how they are used in printed documents. This is more true the further back you go, where communication was more verbal than written and printed matter was more scarce.
How do you know that? = People use words in face-to-face transactions in ways that differ from how they are used in printed documents. This is more true the further back you go, where communication was more verbal than written and printed matter was more scarce.
I am not following your logic.
Have you ever seen people you know use words in their legal sense? I did not grow up around lawyers — I grew up around deliverymen, deli workers, deli owners, contractors — and I never heard them use a word like “process” (to use an example) in its legal sense. If they used it at all it would not be related to a guy knocking on your door and shoving a bunch of papers in your face. I assume this was even more true in an era when most people had no education at all and could barely read and write.
captcrisis — Interesting point and good example.
One point of caution. It invites error to assume any long-running tendency toward particular social change can be reliably extrapolated indefinitely. Nor is it wise to assume general social tendencies applied uniformly across regions.
A commonplace example of that kind of error is assumption that the farther back in time you look in American history, the more religious were the people. Modern religionists especially cherish that notion, and often insist upon it.
If you start with a sense of the present as less religious than the past in America, that can be a somewhat reliable generalization for some regions, going back at least to the Civil War. Prior to that, however, religious fervor was perhaps as various in particular regions as it is today, and during the founding era less generally intense than at most times before or after.
Modern expectations about the piety of the past might be overturned by an encounter with Hamiltonian New York. It is a matter of record that to encounter for the first time the impiety of that place startled at least the young Alexander Hamilton. In general, the founding era seems to have been a more-secular period than most, in more places.
But that did not mean that Thomas Paine’s pronounced founding-era secular renown would prove sufficient to keep his reputation safe until he died, reviled by many for impiety, in 1809.
You have to take your historical tendencies as you find them.
Ok, now I understand; different meanings for the same word, depending on context used. Appreciate the explainer (i.e. process).
Handshake!
I agree. How can you know what verbal usage was like before recordings were possible? I doubt there were very many verbatim transcriptions, and those suffer from the same problem of transcriptions today, where they get cleaned up to avoid all the umms and errs and pauses and back steps.
The AI is able to use terms like landscaping, and it uses the terms consistently with how they are commonly used. So I think the AI chatbots can be a useful. It could have biases, but so does every other source.
“And we found that “landscaping” is a common term—if not the most common term—used to describe the type of work involved in the installation of an in-ground trampoline.”
Who used that term in reference to inground trampolines? Was it the companies selling the inground trampolines? If I want an inground trampoline installed in my backyard, do I search for landscapers or trampoline installers?
In Ground Trampolines Installation
“Remember that the hole for the trampoline will displace a large amount of dirt and you may need to haul the dirt off or have your landscaper do it for you.”
Apparently it’s not unusual to have a landscaper do at least part of the work. But the same could be said for an in ground pool or hot tub, and I’m not sure I’d call them “landscaping”, either.
I think it’s just a marginal case. Those are, unavoidably, going to exist.
I don’t see how an LLM can reliably distinguish between a literal and metaphorical usage. For example, if I say, there’s an invasion of Mexican rapists across the Rio Grande, am I being literal or metaphorical?
That is a good point, had not considered that aspect. And you’re right, there is common use, hyperbole, metaphorical.
Translators have been dealing with that for millenia.
No, he did not literally run to the store.
Yes, but dictionaries and other sources have the same problems.
Penetration however slight, is sufficient to commit the act
corpus linguistics
with some corpus gymnastics