The Volokh Conspiracy
Mostly law professors | Sometimes contrarian | Often libertarian | Always independent
Thomas R. Lee & Jesse Egbert Guest-Blogging About AI and Corpus Linguistics
I'm delighted to report that Prof. Thomas R. Lee (BYU Law, and former Justice on the Utah Supreme Court) and Prof. Jesse Egbert (Northern Arizona University Applied Linguistics) will be guest-blogging this coming week on their new draft article, Artificial Meaning? The article is about artificial intelligence and corpus linguistics; Prof. Lee has been a pioneer in applying corpus linguistics to law. Here is the abstract:
The textualist turn is increasingly an empirical one—an inquiry into ordinary meaning in the sense of what is commonly or typically ascribed to a given word or phrase. Such an inquiry is inherently empirical. And empirical questions call for replicable evidence produced by transparent methods-not bare human intuition or arbitrary preference for one dictionary definition over another.
Both scholars and judges have begun to make this turn. They have started to adopt the tools used in the field of corpus linguistics—a field that studies language usage by examining large databases (corpora) of naturally occurring language.
This turn is now being challenged by a proposal to use a simpler, now-familiar large language model (LLM)—AI-driven LLMs like ChatGPT. The proposal began with two recent law review articles. And it caught fire—and a load of media attention—with a concurring opinion by Eleventh Circuit Judge Kevin Newsom in a case called Snell v. United Specialty Insurance Co. The Snell concurrence proposed to use ChatGPT and other LLM AIs to generate empirical evidence of relevance to the question whether the installation of in-ground trampolines falls under the ordinary meaning of "landscaping" as used in an insurance policy. It developed a case for relying on such evidence—and for rejecting the methodology of corpus linguistics—based in part on recent legal scholarship. And it presented a series of AI queries and responses that it presented as "datapoints" to be considered "alongside" dictionaries and other evidence of ordinary meaning.
The proposal is alluring. And in some ways it seems inevitable that AI tools will be part of the future of an empirical analysis of ordinary meaning. But existing AI tools are not up to the task. They are engaged in a form of artificial rationalism—not empiricism. And they are in no position to produce reliable datapoints on questions like the one in Snell.
We respond to the counter-position developed in Snell and the articles it relies on. We show how AIs fall short and corpus tools deliver on core components of the empirical inquiry. We present a transparent, replicable means of developing data of relevance to the Snell issue. And we explore the elements of a future in which the strengths of AI-driven LLMs could be deployed in a corpus analysis, and the strengths of the corpus inquiry could be implemented in an inquiry involving AI tools.
Editor's Note: We invite comments and request that they be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of Reason.com or Reason Foundation. We reserve the right to delete any comment for any reason at any time. Comments may only be edited within 5 minutes of posting. Report abuses.
Please
to post comments
Good. Let's see if these guys have any notion of the pitfalls awaiting attempts to use context suitable for present-day text, as compared to antique text.
We all share present-minded context. A modern analyst is entitled to a bit of confidence that what we expect we share, we can apply to a modern text, and thus help to illuminate it.
None of us can be confident we share insight into long ago contexts which inflected documents at the times they were created. Documents arrive in the present as historical survivals, absent the context of their creation. None of that comes to the present with the document; all of it has long ago been forgotten.
Thus, in the case of historical texts from long ago, the first step must be to infer that missing context. The process to do that—by making historical survivals critique each other—is not inherent in the practice of corpus linguistics, or even implied by that practice.
What most likely is implied—and what typically shows up in practice—is an attempt to analyze a database of antique texts by application of present-minded context. To do that will always result in inaccurate inferences, and thus result in attribution of specious meanings to the antique texts.
The one possibly useful application of corpus linguistics to antique texts is to automate assembly of databases of historical survivals, for use by the historical scholars who are alone capable to supply the missing contextual inferences on the basis of additional study. I hope to see the authors address that issue in their upcoming contributions.