The Volokh Conspiracy
Mostly law professors | Sometimes contrarian | Often libertarian | Always independent
Judge Suggests Courts Should Consider Using "AI-Powered Large Language Models" in Interpreting "Ordinary Meaning"
That's from Judge Kevin Newsom's concurrence yesterday in Snell v. United Specialty Ins. Co.; the opinion is quite detailed and thoughtful, so people interested in the subject should read the whole thing. Here, though, is the introduction and the conclusion:
I concur in the Court's judgment and join its opinion in full. I write separately … simply to pull back the curtain on the process by which I thought through one of the issues in this case—and using my own experience here as backdrop, to make a modest proposal regarding courts' interpretations of the words and phrases used in legal instruments.
Here's the proposal, which I suspect many will reflexively condemn as heresy, but which I promise to unpack if given the chance: Those, like me, who believe that "ordinary meaning" is the foundational rule for the evaluation of legal texts should consider—consider—whether and how AI-powered large language models like OpenAI's ChatGPT, Google's Gemini, and Anthropic's Claude might—might—inform the interpretive analysis. There, having thought the unthinkable, I've said the unsayable.
Now let me explain myself….
I think that LLMs have promise. At the very least, it no longer strikes me as ridiculous to think that an LLM like ChatGPT might have something useful to say about the common, everyday meaning of the words and phrases used in legal texts….
Thanks to Howard Bashman (How Appealing) for the pointer.
Editor's Note: We invite comments and request that they be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of Reason.com or Reason Foundation. We reserve the right to delete any comment for any reason at any time. Comments may only be edited within 5 minutes of posting. Report abuses.
Please
to post comments
No open thread today?
Bueller?
Don’t stop there…invent a JudgeGPT, just feed it the briefs and transcript and abide by whatever decision it spits out.
Should such an AI be utterly without challenge? No, not when they invent AppealGPT.
Might as well finish the job:
JuryGPT
WitnessGPT
And then finally the sentence can be served by PrisonerGPT (the free version can only do probation).
Language models know what they have been trained to know. I don't trust the trainers.
EXACTLY -- and just wait until someone gets brought up on charges for using the word "niggardly" -- which IS a legitimate (non-racist) word.
Even if it works as intended, and without bias, it is going to take the law even further away from the ability of the public to know what is and is not permitted. You could, over time, theoretically get into a situation where AS34DF67GBFDE was a criminal offense and only those lawyers with the most expensive computers would know that.
Don't believe me -- then explain the NY "dial a verdict" approach where you could have a 4-4-4 split decision on three DIFFERENT crimes and Trump still be guilty. Silly me, but I say "pick one" and all 12 have to agree on it. Taken to it's logical extreme, a defendant could be on trial for rape, murder, and stealing the victim's car. Three different crimes, and all three of which he could be found guilty on, but the jury would have to have 12 agree on the one(s) they found him guilty.
But in NY -- dial a verdict...
This is the actual reason an advocate of running people over with snowplows is angry about this.
I don't really see the comparison between the Trump thing you want to talk about and the thing being talked about here, but also I don't really understand your objection to the jury instructions.
The crime Trump was charged with is doing X in association with doing uncharged crimes A, B, C. To convict, the jury needs to be unanimous in convicting him of X, and unanimous that he did at least one of A, B, and C. They do not need to be unanimous on which one of A, B, and C because those are not charged crimes here. It's because of the predicate structure of the offence that the jury instructions arise.
If your position is that you shouldn't be able to be charged with a crime that requires a predicate without also being charged with the predicate, that seems like a reasonable objection, but that does not appear to have been the case in NYS law, so, like, boo hoo? (I agree with you that the manner in which they're charging him is strategic and unorthodox, but disagree that it's beyond the pale)
The reason your analogy doesn't work is because in your analogy, the jury would have to be unanimous on any of the listed crimes to convict on those crimes. There is no predicate structure. If you imagine a predicate structure where, say, the defendant was charged with brandishing a gun in the commission of a felony (I don't know if this is a law), and if the brandishing statute didn't require charging the predicate felony then I would be totally fine with the jury getting instructions that they have to be convinced that a felony was committed but do not need to pin down the exact felony from a choice of three.
Not surprisingly, Dr. Ed has no idea what he's talking about. Trump is charged with one crime (34 times, but one crime) — falsification of business records — and all 12 jurors must agree on his guilt for each of those 34 charges.
Nice obfuscation.
What do you feel has been obfuscated?
It is like charging someone with murder, and refusing to say who got killed. Those predicate acts are not even elaborated. What election crime? What tax return? We could have a guilty verdict, with no real jury agreement on what Trump did or why it was wrong.
It is not in fact like that. It is like charging someone with murder, where there are several possible ways that the murder was committed, and the jury only has to agree on the fact of the murder and the appropriate mens rea.
More like a felony murder charge and they don't agree on the felony. I kind of have a problem with that, unless the disagreement is something like whether the other crime is first or second degree but they all agree on the basics of what the other crime *was*.
One of the wonderful features of applying actual engineering to law is that you wouldn't have to "trust the trainers". This would come as a great shock and would fundamentally change the legal system. The last step in any system of artifice is this little thing called "test" where one verifies the performance of the system against stimuli and examines the systems response.
The long and short of this is that you can pre-run all the hypotheticals you want with a net while the human judicial system refuses to run any hypotheticals at all. This really changes everything.
This leads to several interesting results:
What does your test harness contain for the acceptance of the system ? The fights about this would be epic. You don't need to guess how it would rule about Kelo, Wickard or Brown. You could ask it up front. If it doesn't give the result you want, is it wrong ?
I also expect the new legal fights would be over what information you could give the network to reach its conclusions, Though the existing structure of a prosecution argument and a defence argument have stood the test of time and would probably be a good format to stick to, this will need to be really thought through.
The idea that one can apply "actual engineering to law" is an enormous category error. Law does not have unifying underlying principles, objective facts that lead to measurable results, clear falsifiability of projected outcomes, or even agreement on which direction of change is good.
The don’t ‘know’ anything. They don't analyse anything. They can't distinguish facts from fiction. They don't have opinions. The don't express meanings, ordinary or otherwise. They are obscenely expensive.
The same could be said of a great many people, apparently including some judges.
It's a bit counter-productive to enshrine it as a standard with these things.
That is like saying dictionaries do not know anything. And yet courts still use them occasionally.
Dictionaries are reliable within the scope of the functions of dictionaries, and nobody expects them to do your thinking for you.
"What's the definition of "female"?"
I say this not to start a flame war, but to center the problem within our incredibly politicized semantic mess. Establishing how a word has been defined is indeed essential, but still leaves the question of how people want words to be re-defined. Many judges, if not nearly all, seem quite willing to move the Overton Window of word definition. So too would trainers.
You can't trust the trainers, nor the coders, to stay out of that fray. And when an LLM does spit out an answer, how can one establish the authority, the correctness, of an algorithm that is practically opaque and unauditable? Even if every answer included detailed citations of sources, the most relevant citations might be the ones that were excluded from the training...the ones you'll never see.
I actually suspect it wouldn't be very hard to produce an LLM that could produce very helpful answers to definitional questions that easily rival the quality of humans doing that. But an LLM would feel no pain when I challenge its biases, so I need a person doing the job instead.
You want your arbitrar to have feelings? Biases? There's a case to made either way
The arbiter would have biases, either way. I need to be able to challenge those biases. Except for a very few discrete questions, the notion of an unbiased arbiter exists only in conceptual theory, not reality.
If people are worried that the thing can be programmed to not be transphobic, they're not seeing the woods for the trees.
A lot of humans also do poorly when asked what is a woman.
This proposal re-empowers the corpus linguistics approach, and adds to its present-minded interpretations problem a secret sauce of manipulable fantasy. If nothing else, that ought to boggle opponents trying to critique the outcomes.
Time to get off that hobby horse. Judge Newsom isn’t talking about using AI for historical interpretation: he’s talking interpreting the ordinary meaning of contemporary words and phrases (in this case, whether installing an in-ground trampoline is landscaping work).
Are you new here?
Noscitur — Remarkably, although other analytical techniques might deserve your rejoinder, with large language models you never know what you are going to get—and you will not be able to find out afterward what you have. You could be using software guaranteed trained on the basis of nothing but works copyrighted since 2001, and end up with output inflected by content republished since then, but using texts with dates of authorship dating back literally centuries.
In the unlikely event there is any inclination to heed copyright laws among folks training AI language models, stuff republished long after going out of copyright could be vastly over-represented in the training models. I concede that ignoring copyright for now, and arguing about it later seems more likely.
It's an interesting idea. My biggest objections offhand are that current LLMs are trained in the written, not spoken, word, and largely from Internet sources. The latter is concerning because a lot of people writing on the Internet are barely literate and misuse words in unusual ways that would not be accepted elsewhere, often due to speaking English as a second language or because of autocorrect, creating unintended use of words. For example, commenters here often fumble around with "precedent" versus "precedence," but a legislature would not want courts to consider these words interchangeable.
A future issue is that as LLMs see more use, their training data may frequently come from LLM-written work, leading to a feedback loop where an LLM may decide words have meanings no human ever assigned to them.
But it still could have value as a tool here.
The judge seems to have addressed your concern as to the sources for LLM information. As for the spoken word the inclusion of transcribed speech will add to the total of information as to how words are used.
It's more than just this -- how many are even Americans? After WWII, English replaced German as the language of science (for obvious reasons, and English has replaced French for the language of diplomacy.
English is the language of Air Traffic Control and Aviation, and we have enough problems with that. Now you are going to be taking the "English" of people who have never even been in the CONUS.
And which version of "English" -- American, British, British Colonial (e.g. Africa), Boer (South Africa), etc. It's not the same language.
Bunk !
AI-anything is nonsense, regressive, anti-progress, and basically stupid. AI is nothing new, just a regurgitated notion hyped with more idiocrasy - a spin on "hyper-steroids" of advertising mantra to sell and make a buck - nothing else.
More non-thinkingness devolves all involved into an abyss of past declines. A flow of events surrounds entrapped people, and thus weight forces a sad tangent.
Computer power increases only to serve the want of make-work to show vaporous notions and not actual advancement requiring actual work - much like manipulation of financials to acquire without actual production.
I'm struggling to address the issue of AI with anything but EVERYTHING that you said. I'm holding back all the same vomit, which has risen to just behind my mouth.
There's a lot of theoretical value in AI that will come to fruition. But in the hands of humans with our frailties, our needs, our willingness to cheat, our lies? Get ready for the great Tsunami of Bullshit. (I suspect there's no way to stop it.)
The concept isn't insane -- an LLM is essentially a summary representation of its corpus of training data in the same way a dictionary is a summary representation of its corpus of training data. The challenge is that both the contents of the training data and the method of training are completely inscrutable for LLMs -- so it'd be impossible to know if the LLM understands the ordinary meaning of the word, or the meaning of the word in its highly non-random training data sample. A second order concern is that slight prompt perturbations sometimes produce radical output differences, and so the question of defining a prompt that is neutral with respect to the distribution of possible answers is a tough one.
(I do this for a living.)
Yes ... we might as well let random 10 year olds interpret "ordinary meaning" for us ...
'so it’d be impossible to know if the LLM understands'
It doesn't.
I was using a flip shorthand wherein the verb "understands" could be written as "has contained within its model embeddings a probabilistic structure that reflects". I decline to engage in a conversation about the nature of cognition more broadly, though I do agree with you that models don't "know" or "understand" anything in a more holistic human sense. I don't think this undermines my point, which is that it is difficult to scrutinize the black box sufficiently to infer why the output is what it is rather than other things it could be, and whether the workings of the model reflect the goals the judge has here.
Well, you can't judge its reasoning because there is no reasoning, which seems to defeat the entire purpose of seeking 'outputs' that require reasoning, not just because its necessary to achieve an 'output' but so that it can be properly assessed by others. A bunch of prompts and search terms and algorithmic programming are not the same thing.
It's bad enough that we let historically ignorant tech workers craft AI algorithms that censor our social media discussions and "manage" our risk ... now we will let technologically ignorant judges delegate their jobs to the same tech workers.
What can go wrong?
Just a concurrence, even on a three-Trumper, all-Federalist Society panel from a backwater circuit. This may not be taking the modern, mainstream, enduring legal academy by storm.
Still working on your inferiority complex, I see.
Remind me, what Circuit do you sit on?
I like mocking right-wing hayseeds and pointing out that backwater bigots are low-quality people doomed in the culture war. Does that bother you, clinger?
Judge suggests chemists use alchemy to interpret chemistry.
OK, I considered it. Let's look at recent experience. There have been quite a few (a dozen?) reported cases where a lawyer generated a brief using AI. And was caught, because the legal citations were completely fictional. Some courts or judges even feel compelled to make a rule about it.
My conclusion: AI Makes S---T Up.
So why would I want to rely on it for anything?
Because having the network do a bunch of the writing and then checking and editing the result would be quicker and more efficient than completely writing it yourself ? Do you completely ignore word processors because you can sometimes hit the wrong key on the keyboard and mistype something ? After all, that error would never occur when handwriting a document.
The failure is not using the tool. The failure is not understanding the limitations of the tool and taking its output as gospel.
Maybe for poor writers and illiterates. Not for able, educated writers.
AI argues convincingly that your client should win because of Doe v. Smith. You check the cite and it turns out Doe v. Smith doesn't exist. Are you better or worse off than before? I would say worse, because you've spent time but don't have an authority to cite for a theory that helps your client.
Interesting to me about this discussion is that political leaning is irrelevant in the belief that is thusly (I think):: "Machine intelligence is inherently lacking in comparison to human intelligence."
I think that assertion is or will be false in some (many? most?) cases, and yet, is widely held. Our inherent and deep suspicion of machine intelligence, which will probably continue to be widely held, may be a saving fact as we face the oncoming Tsunami of Bullshit.
Do people really care this much about this post, or is this just a narrow surrogate for the Thursday Open Thread?
I reflexively condemn this as heresy.
No, they should not
I don't hate the idea, but the biggest problem is it's a black hole when it comes to how it gets the answers it gets. There's no real way to check its work, so to speak. So, while I think it's valid to consider the idea, we're nowhere close to the point where courts should accept the idea.
Very interesting blog post from Judge Newsom. I thought Substack was the go-to choice these days, but here comes the Federal Reporter apparently.
Not just no, hell no!
A.I. is a propaganda tool at present, and will remain one.