The Volokh Conspiracy
Mostly law professors | Sometimes contrarian | Often libertarian | Always independent
Corpus Linguistics, LLM AIs, and the Assessment of Ordinary Meaning
As we show in a draft article, corpus linguistic tools have been shown to do what LLM AIs cannot—produce transparent, replicable evidence of how a word or phrase is ordinarily used by the public.
More and more, judges are seeing the assessment of ordinary meaning as an empirical matter—an inquiry into the way legal words or phrases are commonly used by the public. This inquiry is viewed as furthering some core tenets of textualism. It views the assessment of the ordinary meaning of words as transparent, determinate, and constraining—much more so than a free-wheeling inquiry into the intent of the legislative body.
For many years and for many interpretive questions, dictionaries were viewed as the gold standard. To resolve an interpretive question, all the judge had to do was declare that the ordinary meaning of the text controls, note that dictionaries are reliable evidence of such meaning, and cite a dictionary definition as the decisive basis for decision.
Over time, both scholars and judges have come to question the viability of that approach—especially in cases where competing dictionary definitions provide support for both sides of a case. In that event, a judge's intuitive preference for one definition over another isn't transparent. And it isn't any more constraining than a subjective assessment of legislative intent.
That does not mean that the ordinary meaning inquiry is lost. It just means that we need more sophisticated tools to answer it.
Increasingly, scholars and judges are acknowledging that the empirical dimensions of the ordinary meaning inquiry call for data. And they are turning to tools aimed at producing transparent, replicable evidence of how the language of law is commonly or typically used by the public. A key set of those tools come from the field of corpus linguistics—a field that studies language use by examining large databases (corpora) of naturally occurring language.
When properly applied, corpus linguistic analysis delivers on the promises of empirical textualism. A judge who is interested in assessing how a term is commonly used by the public can perform a replicable search of that term in a corpus designed to represent language use by the relevant speech community. Such a search will yield a dataset with replicable, transparent answers to the empirical dimensions of the ordinary meaning inquiry—as to whether and how often a term is used in the senses proposed by each party. And that data will provide the judge with a determinate basis for deciding the question presented.
Take the real-world case of Snell v. United Specialty Insurance, which raised the question whether the installation of an in-ground trampoline falls within the ordinary meaning of "landscaping" under an insurance policy. Dictionaries highlight the potential plausibility of both parties' positions—indicating that "landscaping" can be understood to mean either (a) botanical improvements to an outdoor space for aesthetic purposes or (b) any improvement to an outdoor space for aesthetic or functional purposes.
A researcher interested in assessing which of these definitions reflects the common or typical use of "landscaping" could perform a search for these terms in a corpus of naturally occurring language. Such a search would yield actual data—on how often "landscaping" is used in either of the above ways (or how often it is used to encompass a non-botanical improvement like an in-ground trampoline). And the resulting data would inform the ordinary meaning inquiry in ways that a dictionary could not.
This sort of corpus inquiry has been proposed and developed in legal scholarship. And it has taken hold in courts throughout the nation—with judges in the United States Supreme Court and in various federal and state courts citing corpus methods in their analysis of ordinary meaning (see fn 22 in our article).
In recent months, critics of corpus linguistics have swooped in with something purportedly better: AI-driven large language models (LLMs) like ChatGPT. The proposal began with two recent law review articles. And it caught fire—and a load of media attention—with a concurring opinion by Eleventh Circuit Judge Kevin Newsom in the Snell case.
Judge Newsom began by noting that dictionaries did little more than highlight a basis for disagreement (on whether "landscaping" is commonly or typically limited to natural, botanical improvements). As a textualist, he said he "didn't want to be that guy"—the judge who picks one definition over another based on his "visceral, gut instinct." So he went looking elsewhere. "[I]n a fit of frustration" or a "lark" he queried ChatGPT, asking (a) "What is the ordinary meaning of 'landscaping'?," and (b) "Is installing an in-ground trampoline 'landscaping'?"
ChatGPT told Newsom that "landscaping" "can include" not just planting trees or shrubs but "installing paths, fences, water features, and other elements to enhance the . . . functionality of the outdoor space" and that "installing an in-ground trampoline can be considered part of landscaping."
Newsom found these responses "[i]nteresting," "sensible," and aligned with his "experience." He channeled points developed in recent scholarship. And he came down in support of the use of AIs as "high-octane language-prediction machines capable of probabilistically mapping . . . how ordinary people use words and phrases in context"—or tools for identifying "datapoints" on the ordinary meaning of the language of law.
Newsom doubled down on this view in a more recent concurrence in United States v. DeLeon—a case raising the question whether a robber has "physically restrained" a victim by holding him at gunpoint under a sentencing enhancement in the sentencing guidelines. Again the question was not easily resolved by resort to dictionaries. And again Newsom turned to LLM AI queries, which he likened to a "survey" asking "umpteen million subjects, 'What is the ordinary meaning of 'physically restrained'?"
The Newsom concurrences are intriguing. Perhaps they are to be applauded for
highlighting the need for empirical tools in the assessment of ordinary meaning. But LLM AIs are not up to the empirical task. They don't produce datapoints or probabilistic maps. At most they give an artificial entity's conclusions on the interpretive question presented—in terms that may align with a judge's intuition but won't provide any data to inform or constrain it.
AI apologists seem to be thinking of chatbot responses as presenting either (1) results of empirical analyses of language use in an LLM corpus (as suggested in Snell) or (2) results of a pseudo-survey of many people (through the texts they authored) (as suggested in DeLeon). In fact they are doing neither. As we show in our article and will develop in subsequent posts, at most AIs are presenting (3) the "views" of a single, artificial super-brain that has learned from many texts.
The apologists need to pick a lane. AI chatbots can't be doing more than one of these three things. And if they are (as we say) just presenting the rationalistic opinion of a single artificial entity, they aren't presenting the kind of empirical data that aligns with the tenets of textualism.
We raise these points (and others) in a recent draft article. And we will develop them further here in posts over the next few days.
To get the Volokh Conspiracy Daily e-mail, please sign up here.
Show Comments (10)