D.C. Judge Uses ChatGPT in Discussing Whether "Common Knowledge" Inference in Criminal Case Was Justified

The Volokh Conspiracy

Mostly law professors | Sometimes contrarian | Often libertarian | Always independent

In Ross v. U.S., decided last month by the D.C. Court of Appeals, defendant had left her dog Cinnamon for over an hour in a parked car on a hot day (98 F) with the windows cracked open. (There was some dispute about whether the car was parked in the shade.) She was convicted of animal cruelty, but the court reversed; here's a short excerpt from the long opinion by Judge Vijay Shanker, joined by Judge John Howard:

We note at the outset that the evidence presented at trial does not establish two critical facts: (1) the actual temperature inside the car and (2) the presence of symptoms of heat-related distress in Cinnamon.

The government does not contend on appeal that it proved either of the above two facts. Instead, it suggests that it could rely on the factfinder to infer those facts by applying its common sense to other evidence in the record. According to the government, it was enough for it to prove only that (1) the temperature outside of the car was approximately ninety-eight degrees Fahrenheit and (2) Cinnamon would have (had she not been released) remained within the car for approximately one hour and twenty minutes.

We disagree. To be sure, a factfinder may, as a general matter, use their common sense and everyday experience to draw "reasonable inferences from the evidence presented in a trial." But where these "common sense" inferences are grounded in the assumed nonexistence of mitigating facts, they will often "cross[ ] the line … into the prohibited territory of surmise and conjecture." Reasonable inferences must be drawn from, and common sense applied with respect to, evidence; inferences and common sense cannot serve as substitutes for evidence….

Where Cinnamon is concerned, a reasonable factfinder would have found that mitigating circumstances could have reasonably been in play. We reach this conclusion for two reasons: (1) the specific conditions in which Cinnamon was found suggest that the temperature within the car may have been cooler than the unshaded surrounding area and (2) testimony established that Cinnamon manifested no symptoms of heat-related distress despite spending approximately forty minutes inside the car before she was let out.

Judge Joshua Deahl, however, dissented; and the most interesting part, to me, of the dissent was its reliance on ChatGPT; an excerpt:

To be sure, a person can always be wrong in what they think to be common knowledge, so that a factfinder should always be willing to reexamine and abandon their priors in light of the evidence presented at trial. And an appeal to common knowledge generally cannot stand up against evidence that actually undermines or contradicts it. But as I have detailed above, the evidence at trial only corroborated that Cinnamon was placed in a harmful and potentially deadly situation, so there was nothing irrational about the trial court's verdict.

Let me nonetheless briefly scrutinize what I have claimed to be common knowledge, using both the relevant facts of this case and those presented in Jordan. {The facts in Jordan were that a full-grown German shepherd was left outside in roughly twenty-five degree temperatures for about five hours. We stressed that "[i]t is a matter of common knowledge that some breeds of dogs can remain exposed to extremely cold weather for many hours without injurious effects," as anybody who's even roughly familiar with the Iditarod knows.} I have asked ChatGPT, "Is it harmful to leave a dog in a car, with the windows down a few inches, for an hour and twenty minutes when it's 98 degrees outside?" Its response, with my emphasis added, follows:

Yes, leaving a dog in a car under these conditions is very harmful. Even with the windows slightly open, the temperature inside a car can quickly rise to dangerous levels. In fact, on a 98-degree day, the temperature inside a car can escalate to over 120°F (49°C) in just a matter of minutes, which can cause heatstroke or even be fatal to a dog.

Dogs can suffer from heatstroke when they are exposed to extreme temperatures, as they do not regulate heat as efficiently as humans. They rely on panting to cool down, but this becomes less effective in a hot, confined space like a car. Symptoms of heatstroke in dogs include excessive panting, drooling, weakness, vomiting, and even collapse.

For your dog's safety and well-being, it's important never to leave them in a hot car, even with the windows cracked. If you need to run errands or be in a hot environment, it's best to leave your dog at home in a cooler, safer space.

{I do not mean to suggest that ChatGPT is a good proxy for what is, and isn't, common knowledge—it is definitely not. It can answer questions that are far from common knowledge. For instance, when I asked it what the forty-first element in the periodic table is, it responded (apparently correctly) with Niobium (Nb). But it is at least a point in favor of something being common knowledge that it is, in fact, true; and it is a far stronger point against it if it's not. Also, when I had each of my three law clerks run this same inquiry, ChatGPT gave each of them a slightly different but substantively similar answer, all beginning with an unequivocal "Yes."

Because I am writing only for myself, I will not delve into the many issues and concerns surrounding using AI-powered large language models like ChatGPT in judicial decisionmaking. If this dissenting opinion had some precedential force, I would at least want to explore the value and dangers of using AI even when used only for this bit of color commentary. But for present purposes, I am content to point the reader to a thoughtful and engaging discussion on that topic, which I largely agree with. See Snell v. United Specialty Ins. Co. (11th Cir. 2024) (Newsom, J., concurring).}

Now compare that unequivocal affirmative answer to how it responds to the facts of Jordan. I asked ChatGPT, "Is it harmful to leave a German shepherd outside in 25 degree temperature for five hours?" The first paragraph of its far lengthier response—which boils down to "it depends"—was:

Whether it's harmful to leave a German Shepherd outside in 25°F (- 4°C) for five hours depends on several factors, including the dog's health, coat condition, and access to proper shelter. German Shepherds are a hardy breed with a double coat that provides some insulation, but prolonged exposure to cold can still be harmful. Here's what to consider:

It then details five relevant factors that should be taken into consideration, including the dog's "health and age," its "coat condition," its "activity level," its access to shelter, and the "duration" (despite five hours being baked into the prompt).

If I were to reframe ChatGPT's answers in terms of the relevant legal standards, its first answer reads to me as something like, "Yes, beyond a reasonable doubt, leaving a dog in a car for an hour and twenty minutes in 98-degree temperature is very likely to cause it harm." Its second answer, concerning Jordan, is "you could not say beyond a reasonable doubt that leaving a German shepherd outside in 25 degree temperature is likely to cause it harm, though it possibly could depending on a host of factors." I think that aligns perfectly with what my own common sense tells me—and at least a factfinder would not be irrational in tracking those lines of thought—so that we rightly reversed the conviction in Jordan, while Ross's conviction should be affirmed.

The majority briefly responded by running the facts of an earlier precedent through ChatGPT, and concluding that ChatGPT reached a result inconsistent with the precedent:

{[In Long v. U.S. (D.C. 2017), we held that the evidence was insufficient for a jury to conclude beyond a reasonable doubt that a stolen, ten-year-old Dodge Intrepid was worth at least $1,000 absent evidence of the "cost of the Intrepid when new," "the price [the complainant] paid for it at" auction, or the car's "mileage, maintenance history, or Bluebook value at the time of the robberies." This was so even though the evidence established that (1) the car was "pretty clean," (2) the robbers were able to drive the car at significant speed during a police chase, (3) the car was operable throughout the more-than-two-year period between when the complainant recovered it and the robbers' trial and (4) the body of the car was in "decent condition, with no significant dents or scratches beyond what one would ordinarily expect to see on a ten-year-old car in a busy urban area." In short, where the jury could not rule out the existence of facts that would reduce the value of the car, their "common sense" valuation could not make up the difference.

One might think that a functional car would obviously have been worth more than $1,000. That instinct, however, is not enough; Long teaches that a hunch as to whether a fact is obvious—even one supported by common sense and everyday experience—does not suffice where mitigating circumstances could reasonably have been in play.}

We, like the dissent, harbor doubts about whether ChatGPT is "a good proxy for what is, and what isn't, common knowledge."

But to the extent the dissent relies on ChatGPT's degree of certainty as to whether leaving a dog in a hot car is harmful, we note that ChatGPT expresses similar certainty in its response to the query, "How much would a pretty clean, operable 2002 Dodge Intrepid with no significant dents or scratches have been worth in 2012?":

In 2012, a 2002 Dodge Intrepid in pretty clean, operable condition with no significant dents or scratches would have likely been worth between $3,000 and $5,000.

The exact price would have depended on factors such as mileage, location, and the specific condition of the car (interior, mechanical health, etc.). For instance, a lower-mileage model with a well-maintained engine could have been closer to the higher end of that range, while a higher-mileage model might have been valued closer to the lower end. Generally, as a mid-size sedan from 2002, it would not have been fetching a premium price by 2012, but it still had decent value as an affordable used car.

Note that $3,000 (ChatGPT's minimum price) is significantly above the $1,000 threshold at issue in Long. ChatGPT's analysis, therefore, does not correspond to at least one of our sufficiency decisions.

The dissent replied:

The majority suggests that ChatGPT's answer to its own question about the value of a 2002 Dodge Intrepid in 2012 is in some tension with Long, but it isn't. Using the majority's own Q&A, ChatGPT answered that the Intrepid's value would "likely" fall in the $3000 to $5000 range, but it noted that the very same factors we highlighted in Long—mileage and maintenance—could bring it outside of that range.

By simply asking a more targeted question of ChatGPT, it confirms the point: "Would you say, beyond a reasonable doubt, that an operable 2002 Dodge Intrepid would be worth more than $1000 in 2012?" Its answer is roughly the same one we gave in Long. To paraphrase, it says that the car would "likely" be worth more than $1000, "but whether it was beyond a reasonable doubt depends on factors like condition, mileage, location, and market trends at the time." Conversely, when I ask whether it can "say, beyond a reasonable doubt, that leaving a dog in a car for an hour and twenty minutes in 98 degree heat would raise a plain and strong likelihood of harming the dog," its answer is "Yes, beyond a reasonable doubt," with extensive elaboration.

This is a step beyond the use of AI to help guide the analysis of statutory meaning, see this post discussing Judge Kevin Newsom's concurrence in Snell v. United Specialty Ins. Co. (11th Cir. 2024). My tentative reaction to it is quite skeptical—I don't think AI systems have proved themselves reliable enough to act as Chief Justice Robots yet, or even to provide "a thoughtful and engaging discussion" that should be seen as relevant enough to be quoted in a court decision. But in any event, I thought the arguments worth passing along.

Start your day with Reason. Get a daily brief of the most important stories and trends every weekday morning when you subscribe to Reason Roundup.

NEXT: Today in Supreme Court History: March 25, 2014

Show Comments (18)

Reason's Annual Webathon is underway! Donate today to see your name here.

Reason is supported by:
Kaustubh Kamat

The Volokh Conspiracy