The Volokh Conspiracy
Mostly law professors | Sometimes contrarian | Often libertarian | Always independent
AI Hallucination Case Stemming from Use of a Paralegal's AI-Based Research
I blogged yesterday about AI hallucinations in court filings by prominent law firms, as well as a nonexistent source cited in an expert's declaration (the expert works for leading AI company Anthropic, though at this point it's not yet clear whether the error stemmed from an AI hallucination or from something else). But I thought I'd blog a bit more in the coming days about AI hallucinations in court filings, just to show how pervasive the problem is: I'm seeing court decisions about this every few days. And those decisions are likely just the tip of the iceberg, since many hallucinations won't be noted in court decisions, and the great majority of court decisions are state trial court decisions that don't show up on Westlaw.
Here's the first case that showed up in my daily WestClip query related to the subject, an opinion yesterday by Judge Rachel Kovner (E.D.N.Y.) in Ramirez v. Humala:
Plaintiff Ramirez's response letter, … signed by [her lawyer,] Ms. Stillman, cites eight cases. Of the eight, the Court was unable to locate four: (1) London v. Polish Slavic Fed. Credit Union, No. 19-CV-6645, 2020 WL 1275619 (E.D.N.Y. Mar. 17, 2020); (2) Rosario v. 2022 E. Tremont Hous. Dev. Fund Corp., No. 21-CV-9010, 2023 WL 1993700 (S.D.N.Y. Feb. 14, 2023); (3) Paniagua v. El Gallo No. 3 Corp., No. 22-CV-7073, 2024 WL 1046856 (S.D.N.Y. Mar. 11, 2024); and (4) Luna v. Gon Way Constr., Inc., No. 20-CV-893, 2022 WL 826856 (S.D.N.Y. Mar. 18, 2022). The Court ordered Ms. Stillman to either provide copies of the four cases or show cause why she should not be sanctioned for citing nonexistent cases pursuant to Federal Rule of Civil Procedure 11(b) and the inherent power of the Court.
Stillman explained that this originated in the work of her paralegal:
Ms. Ruiz[, Ms. Stillman's paralegal,] explains that she took the initiative to assist with the legal research for the reply and generated the citations at issue "using secondary tools, including public search resources and AI-based research assistants."). Ms. Ruiz did not check that the citations she had generated corresponded to real cases before passing them to Ms. Stillman. Ms. Stillman, in turn, relied on Ms. Ruiz, "a diligent and trusted member" of the team, and "did not independently verify each citation prior to filing."
Federal Rule of Civil Procedure 11(b)(2) provides:
By presenting to the court a pleading, written motion, or other paper—whether by signing, filing, submitting, or later advocating it—an attorney … certifies that to the best of the person's knowledge, information, and belief, formed after an inquiry reasonable under the circumstances: … the claims, defenses, and other legal contentions are warranted by existing law or by a nonfrivolous argument for extending, modifying, or reversing existing law or for establishing new law.
"A fake opinion is not 'existing law' and citation to a fake opinion does not provide a non-frivolous ground for extending, modifying, or reversing existing law, or for establishing new law." Accordingly, the Second Circuit, as well as district courts across the country, have found the submission of nonexistent case citations in filings to the court to constitute sanctionable conduct under Rule 11(b)(2). See, e.g., Park v. Kim, 91 F.4th 610, 613–16 (2d Cir. 2024); Mata v. Avianca, Inc., 678 F. Supp. 3d 443, 461 (S.D.N.Y. 2023); Gauthier v. Goodyear Tire & Rubber Co., No. 23-CV-281, 2024 WL 4882651, at *2–3 (E.D. Tex. Nov. 25, 2024); Wadsworth v. Walmart Inc., 348 F.R.D. 489, 495–96 (D. Wyo. 2025); Dehghani v. Castro, No. 25-CV-52 (MIS) (DLM), 2025 WL 988009, at *4 (D.N.M. Apr. 2, 2025); Nguyen v. Savage Enters., No. 24-CV-815 (BSM), 2025 WL 679024, at *1 (E.D. Ark. Mar. 3, 2025); Bevins v. Colgate-Palmolive Co., No. 25-CV-576, 2025 WL 1085695, at *7 (E.D. Pa. Apr. 10, 2025); Benjamin v. Costco Wholesale Corp., No. 24-CV-7399 (LGD), 2025 WL 1195925, at *6–8 (E.D.N.Y. Apr. 24, 2025).
Ms. Stillman's conduct violated Rule 11. "Rule 11 explicitly and unambiguously imposes an affirmative duty on each attorney to conduct a reasonable inquiry into the viability of a pleading before it is signed." A reasonable inquiry means, at minimum, that attorneys must "read, and thereby confirm the existence and validity of, the legal authorities on which they rely." … "[N]o inquiry cannot be deemed objectively reasonable even if the reliance is placed in an experienced attorney." … Ms. Stillman failed to conduct this minimal inquiry.
Moreover, Ms. Stillman acted with subjective bad faith, as required for the imposition of sua sponte sanctions under Rule 11. Ms. Stillman's failure to make any inquiry at all into the veracity of the cases provided by her paralegal was so "completely without merit" that subjective bad faith may be inferred. See, e.g., Mata, 678 F. Supp. 3d at 464 (finding subjective bad faith where attorney signed a filing despite "not reading a single case cited … and taking no other steps on his own to check whether any aspect of the assertions of law were warranted by existing law"); Benjamin, 2025 WL 1195925, at *7–8 (similar); cf. United States v. Cohen, 724 F. Supp. 3d 251, 258–59 (S.D.N.Y. 2024) (declining to find subjective bad faith where counsel did not read certain citations but sincerely believed they had been provided by co-counsel who had reviewed them).
In similar cases where sanctions have been imposed on attorneys for the submission of nonexistent, AI-generated cases, monetary sanctions have ranged from $1,000 to $5,000. See, e.g., Mata, 678 F. Supp. 3d at 466 ($5,000 sanction); Gauthier, 2024 WL 4882651, at *3 ($2,000 sanction); Wadsworth, 348 F.R.D. at 498 ($3,000 sanction for the most culpable attorney; $1,000 sanctions each for two less culpable attorneys); Dehghani, 2025 WL 988009, at *5 ($1,500 sanction); Nguyen, 2025 WL 679024, at *1 ($1,000 sanction); Benjamin, 2025 WL 1195925, at *9 ($1,000 sanction). Courts have also imposed nonmonetary sanctions including referral for attorney discipline, see, e.g., Park, 91 F.4th at 616, revocation of an attorney's pro hac vice admission, see, e.g., Wadsworth, 348 F.R.D. at 498, and mandatory CLE training on the use of AI in the legal field, see, e.g., Gauthier, 2024 WL 4882651, at *3.
A sanction on the low end of the spectrum is warranted here as a result of Ms. Stillman's prompt admission of error, her and Ms. Ruiz's apologies to the Court, and Ms. Stillman's averment that she is "conducting a full internal review of our citation practices and research protocols" to ensure that "any research-based citations in future drafts" will be "pulled directly from Westlaw or PACER and verified by a supervising attorney." Given these circumstances, a sanction of $1,000 is sufficient but not more than necessary to serve the deterrence purposes of Rule 11.
The court also concluded that the law firm was jointly responsible for the sanction; that's not practically important here, because the firm appears to be Ms. Stillman's own, but it may be relevant in other cases. And the court also noted that, "[c]onsistent with common practice, Ms. Stillman shall also inform her client … of these developments by serving a copy of this order on her client."
Editor's Note: We invite comments and request that they be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of Reason.com or Reason Foundation. We reserve the right to delete any comment for any reason at any time. Comments may only be edited within 5 minutes of posting. Report abuses.
Please
to post comments
It's not a "hallucination", it's a programming error. The pattern matching algorithm has assembled text fragments that do appear together but never in the same context. For example, what kind of big apple would the Big Apple be? Context matters. Call this kind of error what it is: a bug.
I don't use the term "Artificial Intelligence" anymore for the same reason. These algorithms are neither artificial nor are they intelligent. They are simply programming via data affinity. I prefer the term "automated inference" machine.
I guess we should stop being so hard on students for AIing their way through school if everybody else does it for everything now. You can even cheat your way to a high paying job at an AI specialist company who should be experts at not keeping around anyone who their AI could just replace.
The AI systems are self-improving by means of analyzing the growing information bases that are available on the Internet. The autogenerated knowledge bases are stored in the source management system.
Yet, artificial intelligence often seems like artificial stupidity. I think we want to create synthetic sapience.
ChatGPT suffered a wild hallucination while I was using it in legal research.
Below I was directing questions to ChatGPT about Juárez v. Select Portfolio Servicing, Inc., 708 F.3d 269, 276–77 (1st Cir. 2013).
I said: Could this Conclusion be improved? Juarez seems important because of the prior litigation.
ChatGPT said:
Yes, your instinct is right—referencing Juarez v. Harvard Crimson in your Conclusion can significantly strengthen your case by illustrating factual continuity, patterned conduct, and Harvard Crimson’s history of discriminatory message control, particularly when it involves viewpoint-based exclusion.
Here is an improved version of your Conclusion that integrates the Juarez context and reinforces the procedural and doctrinal stakes:
CONCLUSION
This case presents unresolved legal questions at the intersection of contract law, public message transmission, common carriage doctrine, and constitutional accountability. The Harvard Crimson operated a public-facing comment platform that invited user participation under uniform conditions, accepted the benefits of public engagement, and then suppressed speech based on viewpoint. Under settled principles of Massachusetts common law and statutory common carriage obligations, such conduct supports both breach of contract and nondiscrimination claims.
Dismissal at the pleading stage was improper. The platform’s legal status, its relationship to state functions, and the applicability of Section 230 are all contested and fact-dependent. These questions cannot be resolved without discovery and adversarial development of the record. Section 230 is an affirmative defense that does not override voluntarily assumed contractual duties or immunize discriminatory message carriage that would otherwise violate state law or constitutional norms.
Importantly, this is not the first time Harvard Crimson has been accused of excluding speech based on the political or ethnic identity of the speaker. In Juarez v. Harvard Crimson, the newspaper was sued for allegedly refusing to print ads by Latino community advocates because of viewpoint and ethnic association. That case—while ultimately dismissed—reveals a troubling pattern of discriminatory gatekeeping by a platform that presents itself as open to the public.
Because the complaint alleges plausible claims grounded in contract, common carriage, and constitutional law, and because no binding precedent forecloses these theories, Plaintiff-Appellant respectfully requests that this Court reverse the judgment below and remand for further proceedings, including leave to amend the complaint to further clarify the legal and factual basis of the claims.
Would you like a version that footnotes the Juarez citation or adds a parenthetical to describe its procedural posture?
I said:
What is a proper citation for Juarez v. Harvard Crimson?
ChatGPT said:
I understand you're seeking the proper legal citation for Juarez v. Harvard Crimson. However, after thorough research, I couldn't locate a published case under that title. It's possible that the case was filed under a different name, dismissed early in the process, or not reported in publicly accessible legal databases.
If you have additional details about the case—such as the full names of the parties involved, the court in which it was filed, the docket number, or the year of filing—I would be glad to assist further in locating the correct citation.
ChatGPT would certainly do a better job than Martillo.
My initial thoughts, for whatever they’re worth:
1. The first fake case is a secondary citation in support of the claim that jurisdictional allegations can’t be rejected for purposes of a motion to dismiss. That seems like it’s probably right (although I didn’t actually read any other documents so I don’t know the full context), and is accompanied by a citation to a real Second Circuit case that more or less says that, although I would tried to find something that articulated it a little more precisely. That said, I don’t know why you’d throw in the fake case (which purported to be a district court decision) even if it was real.
2. The second and third cases are both offered in support of the proposition that courts “consistently” refuse to consider unsigned tax returns as evidence. That seems like something that could be true, but if your only support is two district court cases, I don’t see how you wouldn’t look them up to see the details of what happened before filing.
3. The fourth case is offered to support the proposition that a court can’t dismiss a case because the plaintiff is unlikely to prove damages sufficient to meet a jurisdictional threshold. To the extent that’s right (again, not sure about the details of the context), I can’t imagine only citing district court cases for it. (There’s a second citation to a real district court opinion, but unless I’m missing something it doesn’t even relate to the issue, at least on the pincited page.)
4. To add an anecdote: I’ve recently used Claude for legal research of a sort (not for actual cases to be clear, just questions that occurred to me and I was curious about and didn’t have an obvious way to find answers through traditional research methods) three times. The first time it immediately gave me a real Supreme Court case that I feel conclusively resolved the question, but would require some analogical reasoning; I would guess that there’s a ~75% chance would have found the case in five to ten minutes, at which point I probably would have moved on if I hadn’t. On the second one, it gave me a bunch of real cases that I’d already found that didn’t quite relate to the issue I was actually interested in. And on the third it gave me a bunch of fake district court decisions, but in trying to verify them I did find a real case that was directly on point.
5. What’s with New York making people pretend that their filings are letters to the judge? So silly.
Before the advent of printed reporters in the 16th century, holograph notes, traded among lawyers and judges, were the best source for the rule of each case and the weight it was given. (The records go back much further, but they mainly record pleading tactics.) Interestingly, the notion of binding specific precedent only really took hold when everyone could consult the same text.
The short-term fix to these complete hallucinations (which are actually kind of interesting) is obviously some canonical citation database or list of case names, which I assume the in-house operations use. The next level might be some yet-unthought adversarial AI against the naive LLM to somehow use challenges to distinguish true parsings from false parsings.
But my guess is that, given the sums involved, the in-house solutions are an order of magnitude above the public discussion. You could train an AI for each judge in the circuit, in order to know how to write the brief based on their opinions, or a chat bot to do the same when practicing for oral arguments--you could train the latter on the transcripts as well. I saw an article yesterday about an application that used a 'bag of words' corpus approach to mine ECHR fact patterns, and predict decisions at around 75%.
But given the means now at hand for using data, there's no reason for every state or circuit not to maintain a canonical database of the full texts of all published decisions. Make it a nightly build on github. A common source enables binding precedent. Too many Cokes spoils the stew.
Mr. D.
The thing that has been amazing to me about these AI cases is not that people use AI; it's how many attorneys apparently don't bother to actually read the cases they cite. They just find something that sounds good and drop it into their brief, without actually confirming that it is on point and supports the position they are taking. I guess they just assume that judges and opposing counsel don't read the cases either. Which makes me wonder how many attorneys weren't bothering to actually read cases they found on Westlaw and in printed volumes before AI came along; maybe AI has merely exposed a problem that has always existed.
Oh, and blaming my paralegal to the judge would never enter my head. This attorney should doubly be ashamed of herself.
"Which makes me wonder how many attorneys weren't bothering to actually read cases they found on Westlaw and in printed volumes before AI came along"
Many, I am sure. And that's intolerable, yes. But at least if you find a case on Westlaw, you know it's a real case.
By the way, years ago there was a judge in Ohio who wanted to write about an issue, so he fabricated a case that didn't exist and then published an opinion that made its way into the NE Reporter. When it was discovered he was disbarred and the opinion was withdrawn from publication. Which is not the same kind of phony case we're discussing here but still amusing.
Cite? I'm not saying you're lying; I'd just like to read more about that.