The Volokh Conspiracy
Mostly law professors | Sometimes contrarian | Often libertarian | Always independent
D.C. Judge's Thoughts on Use of AI by Judges
From D.C. Court of Appeals Judge John Howard's concurrence last month in Ross v. U.S., about the possible upsides and downsides of judges using AI (entirely apart from whether they use AI results as arguments in their opinions):
To be clear, I cast no aspersion on the use of AI by my colleagues. I find it interesting. AI tools are proliferating and we ignore them at our own peril. Not only for the concerning capabilities they now give parties with ill intent, but for the great utility such tools could potentially provide in easing the strain on our increasingly overburdened courts.
AI tools are more than a gimmick; they are coming to courts in various ways, and judges will have to develop competency in this technology, even if the judge wishes to avoid using it. Courts, however, must and are approaching the use of such technology cautiously. Specific use cases are being considered and we must always keep in mind the limits of different AI tools in how and when we use them, particularly with regard to security, privacy, reliability, and bias, to ensure ethical use.
Broadly, an AI system can be susceptible to bias at multiple points in its execution. Model Code of Judicial Conduct Rules 2.2 and 2.3, dealing with impartiality and fairness and bias, prejudice, and harassment, are potentially implicated in reliance on a system infected with bias. Ignorance of the technology seems like little defense in consideration of the duty of competence in Rule 2.5.
Other issues abound, but security and confidentiality of court information are particular concerns. Accordingly, before using an AI tool a judicial officer or staff member should understand, among many other things, what data the AI tool collects and what the tool does with their data.
The quote has many attributions that "if it is free, you are the product." Many AI tools benefit from what we feed into them, documents, prompts, etc., virtually every aspect of our interaction trains and hones such tools. That is part of the early-mover advantage of ChatGPT in particular, which blew away previous records to reach one million users in five days—and 100 million within two months of going live. As of January 30, 2025, it was estimated to have approximately 300 million weekly users. It is hard to imagine a company that could afford to pay that many people to test and develop their model. However, such a system raises serious practical and ethical issues for a court. Security is a preeminent concern. I briefly look at a few hypotheticals in the context of this court to illustrate.
First, take the use case of a judge utilizing an AI tool to summarize briefs filed with the court well in advance of oral argument—a practice, along with summarizing voluminous records, that some AI tools appear to be quite adept at. It is the practice of this court to announce the members of a particular panel of judges the week before an oral argument. Should a judge be using an AI tool that trains on the data they submitted, they have now surrendered data which includes—at bare minimum—the submitted data, i.e. the briefs of the parties, and potentially personally identifying data, i.e. a username, IP address, and email address. Data which, reviewed together, could expose the judge's involvement on the panel to individuals and systems with access to that data before that information is public.
Next, fast-forward past argument and assume our hypothetical technophile jurist decides they will have the AI tool aid them in the preparation of a decision. AI tools offer many potential use cases here. For one, perhaps with careful prompting, detailing the types of facts or story that is desired, the AI tool could be used to pull from the record and produce a first draft of the factual rendition section of the decision. It could develop an initial statement of the standard of review and controlling law. In varying degrees of quality, depending on the tool and inputs, it could formulate a first take at some analysis.
However, again, should the AI tool be training itself on the data, someone with access to the data would have access to judicial deliberative information and potentially personally identifying login/user information that could identify the judge as well. Of even more concern, as the data trains the tool, another user could stumble upon it or some aspects of it regurgitated by the AI tool. Even if the odds are miniscule, confidential judicial deliberative information has potentially leaked out ahead of a decision in this scenario.
Consider further the scenario that any of the material used in either prior hypothetical contained sensitive information that would otherwise be subject to redaction, i.e. social security numbers, account numbers, minor's names, etc. If unredacted briefs or records were loaded into the AI tool, it would be an instant failure of the court's duty to protect such information. Three hundred million users, in the scenario of ChatGPT, described above, would potentially have access.
I pause briefly here to note that such concern does not appear to arise from the use of AI in this decision. The dissent's generalized hypothetical questioning, without more, does not strike me as remotely unique to this case in a way that could even inadvertently expose deliberative information. The majority's use of ChatGPT provides comparison by prompting the tool against the facts of a previous case for analysis. It strikes me that the thoughtful use employed by both of my colleagues are good examples of judicial AI tool use for many reasons—including the consideration of the relative value of the results—but especially because it is clear that this was no delegation of decision-making, but instead the use of a tool to aid the judicial mind in carefully considering the problems of the case more deeply. Interesting indeed.
The previous examples that I described as potential improper use of an AI tool, however, could be accomplished with the use of an AI tool with robust security and privacy protections. Even more exciting, AI companies have begun to announce the release of government oriented tools which promise to provide such protections and allow for such potential use cases.
As state courts across the country cautiously consider these issues, the National Conference of State Courts has taken a lead in coordinating efforts. It has put together an AI Rapid Response Team and created a policy consortium, constantly updating resources. And the D.C. Courts have not stood idly by, creating our D.C. Courts AI Task Force and partnering with the National Conference of State Courts. As the use of AI begins to appear at the D.C. Courts, litigants and the citizens of the District can be assured that cautious and proactive thought is being directed by our judges and D.C. Courts team members, toward the beneficial, secure, and safe use of AI technology.
Editor's Note: We invite comments and request that they be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of Reason.com or Reason Foundation. We reserve the right to delete any comment for any reason at any time. Comments may only be edited within 5 minutes of posting. Report abuses.
Please
to post comments
It is true that whatever a judge submits to chatGPT may end up in OpenAIs database, maybe accessible to OpenAI employees, and in the event that OpenAI suffers a data breach may be accessible to the hacker. And those are things that Judges should be concerned about. However, this judge seems to also be concerned that by training future models on that data, the data might also become accessible to other users of chatGPT. And, um, that is just not how LLMs work. That is not a risk of the technology.
I would also note that this all presumes that you are running a model on an AI companies server. This is surely the most common way to run an LLM, and the only way to run some of the good models like chatGPT. However, it is not the only way. A person with moderate technical sophistication can absolutely run some LLMs, such as Meta's Llama, locally on their own computer. That would not pose any of the risks discussed here.
It is how Retrieval Augmented Generation works, although that doesn't involve additional training.
I promise you that when you submit a query to ChatGPT, it is not doing RAG on queries previously submitted by other users. No AI company provides LLMs that do that.
The article expressed concern about judges "using an AI tool that trains on the data they submitted". RAG is a well-known technique for building a domain-specific chatbot on top of a very large general-purpose model, to get some of the advantages of the very large model without its very large training costs. There are even blog entries giving detailed instructions on how to do this.
So, even if ChatGPT itself doesn't train using your data (not that they couldn't, but their current policy is that they will not if you are using a paid service and have not opted in to sharing your data), a law-specific bot might very well be built on top of GPT or some other large model using RAG, and how it isolates your data is up to the implementer of that service.
You seriously think that OpenAI, or any AI company, is going to use the queries that large numbers of users submitted in the past in a RAG model? You are veering into conspiracy theory levels of crazy here. Firstly, on a public relations level, there would be enormous outrage at the violation of privacy. Nobody thinks that the queries they submit to an LLM are going to be published like that. Secondly, it would make no sense on an engineering level. RAG trades smaller training costs for higher inference costs. If you put a large number of prior queries into a RAG database, your inference costs are going to be crazy high. The obviously better way to build a specialized legal LLM, if you are going to use those queries at all, is to use them as training or fine tuning data. So even if OpenAI does build a law-specific LLM at some point, I guarantee it will not use prior user queries in a RAG database.
Use query data for training or fine-tuning? I have been assured that's just not how LLMs work!
Anyway, once you are using RAG assembling a larger corpus of domain-specific data doesn't increase inference costs if you make retrieval more selective. That's why you are doing retrieval in the first place, to avoid adding the entire corpus into the prompt.
> Use query data for training or fine-tuning? I have been assured that’s just not how LLMs work!
I’m not sure where the confusion is. Maybe you just don’t understand how LLMs are built. So I’ll go through it slower.
The basic process of constructing an LLM is called training. Large amounts of data are used and weights are trained. This requires a very large amount of compute, and for a remotely decent model is just impossible outside of the large data centers that these large companies have. But notably the LLM is just learning associations between things, it is not memorizing text. So even if a users query to a prior LLM were included in the training data, a person using the LLM would not get access to that query.
Fine-tuning is essentially the same process, except you start with an already-trained LLM and just feed it very specific sorts of data to make it better at that specific sort of data. Again, all it is learning is associations. It is not memorizing text. So even if a users query to a prior model were included in the training data used for fine tuning, that query would not be accessible to users of the new LLM. This is what the judge seems to think can happen, and what I am asserting cannot happen.
RAG is something different. RAG is essentially combining an already trained LLM with a database where actual texts are stored. It is technologically possible to put prior user queries into such a database, but as I explained, it would be dumb both form an engineering and a public relations standpoint. On the engineering side, you are just wrong. The larger the database to be searched, the more time that search will take. This is true of all databases whether AI is involved or not. The extra time isn’t in the LLMs processing of the results of the database search, assuming you apply a constant limit to the results of the database search. But the database search itself will take more time.
I understand just fine. You said inference costs would rise, the cost of running a vector search on a database is not an inference cost.
Inference doesn't just mean a forward pass through a neural network. Inference is anything that happens between when a user submits a query and when the model returns a response. In the case of a RAG system, yes, that does include a search of a database.
Yes, but what is AI's thoughts on use of AI by judges?
Or the use of judges by AIs?
They could remove a lot of concerns by running AI locally on toue own server. It's not hard to do.
It would also allow them to do some limited training to fine tune the model for their needs.
First, take the use case of a judge utilizing an AI tool to summarize briefs filed with the court well in advance of oral argument—a practice, along with summarizing voluminous records, that some AI tools appear to be quite adept at.
You mean judges are too lazy to read the briefs? Sheesh. (The record is something else, that can really be voluminous and much of it irrelevant to the appeal. But if the briefs cite to the record, those parts should be read.)
To me, this shows one of the problems with the "safety" aspects of AI. It creates a purposeful bias in favor of any kind of safety decision. That means that when asked about a situation that could potentially put an animal at risk, it always goes overboard on the safety side. If you ask openAI about leaving your dog in the car for 60 seconds, it says that is harmful to the dog. Any dog owner would know that 60 seconds would not be harmful, but the AI is trained to exaggerate risks of harm to ensure the output does not itself cause harm. That only works if you do not trust the AI to produce accurate accounts of such harm, which at least one judge seemed to accept.
I think it is a mistake to conflate AI's bias and reliability problems with it's confidentiality problems.
The confidentiality problems can now be easily resolved by paying for a 'private' instance of the AI tool - that is, one where the vendor does contractually commit to protecting your input data and to not using it to further train their systems. This is now a fairly common option for commercial users.
Nothing they can currently do, however, will solve the problems of input bias, training bias and all the other reliability problems.
Of course, even if the vendor promises not to use the input data themselves, you still have questions about transmission security BETWEEN the judge and the AI tool. For really high-stakes national security cases, that matters too. Eavesdroppers are everywhere.
If that's not thoroughly covered in your contract and IT implementation, it's time to fire your IT staff.
Will you be publishing
AI's thoughts on .....D.C. Judge's Thoughts on Use of AI by Judges
What is your point?