The Volokh Conspiracy

Mostly law professors | Sometimes contrarian | Often libertarian | Always independent

Volokh Conspiracy

The path forward for law and corpus linguistics


For reasons we've been discussing, we see corpus linguistic analysis playing a central role in legal interpretation going forward. When problems of ambiguity in legal language arise, we expect judges and lawyers to turn increasingly to data from linguistic databases and less to mere dictionaries. Corpus linguistics will form an important part of the future of legal interpretation.

Yet we have to admit it has been a little slow to catch on. One of us is a judge (on the Utah Supreme Court) who has advocated the use of corpus analysis in judicial opinions—in cases raising the question of the meaning of a "custody" proceeding under the federal Parental Kidnapping Prevention Act (here) and the question of the meaning of the "discharge" of a firearm (here). (The other was the law clerk who cajoled the judge into using linguistic tools for interpretation.) And the reception among this judge's colleagues has been a bit less than enthusiastic. At least one colleague implied that sua sponte corpus analysis ran afoul of judicial canons of ethics. And the vote count of concurrences in this judge's corpus linguistic opinions stands at … zero. The closest we've come is an indication by one colleague of the possibility of considering this methodology in an "appropriate case" in the future. Not exactly a ringing endorsement.

Yet the reception elsewhere has been better. Our Utah corpus methodology has garnered seven more votes in Michigan than it has in Utah. In an important case in the Michigan Supreme Court, all seven justices turned to corpus linguistics to answer a question of ordinary meaning of a statute—citing the Utah opinions (and the former clerk's article) and following their methodology. The case (here) raised the question of the meaning of the term "information" in a statute forbidding the use of "information" provided by law enforcement if compelled under a threat of an employment sanction. The specific question presented was whether false information qualified as "information." The majority opinion said that false information counted. The dissent said it didn't. But both sides cited Utah Supreme Court opinions and employed the methodology (and utilized the corpora) that we have advocated. And the briefing in at least one subsequent case suggests that lawyers in Michigan are beginning to follow the court's lead—offering the justices the data-driven analysis they now seem to be looking for.

That is significant. If lawyers can be motivated and trained to present corpus-based data on problems of ordinary meaning, then judges can overcome one of the barriers that our article identifies as standing in the way of the more widespread use of this methodology. We call this the "propriety" problem. This is the sense that sua sponte corpus analysis by judges amounts to judicial "investigation" in contravention of our rules of judicial ethics. We reject this charge because the ethics rule applies only to independent investigation of adjudicative facts. So that would bar judges from conducting an independent investigation of a crime scene to make their own conclusions about disputed facts, or, perhaps, from conducting a timed video experiment with their law clerks to test how long it really takes to "don and doff" safety clothing (as here). But the ethics rule is no bar to independent judicial investigation of legislative facts—facts necessary to determine what the law is. Judges do that all the time. And we hope they will use their own lights to do so—in performing their own legal research and consulting any relevant linguistic materials (dictionaries or otherwise) even if not cited by the parties.

That said, we think adversary briefing will add to the quality of a court's linguistic analysis. As was demonstrated in the Supreme Court case of FCC v. AT&T, attorneys do not have to wait around for judges' permission before making effective and persuasive corpus-based arguments. And we definitely applaud movements toward more thoughtful briefing on problems of ordinary meaning. Moreover, we believe that if we build it, they will come: When judges start relying on corpus linguistic analysis, lawyers will start offering their take on it. So it works both ways. And we see it expanding exponentially in the (hopefully near) future.

The expansion will undoubtedly extend beyond just statutory interpretation. Problems of ordinary meaning—and of lexical ambiguities in legal language—abound in many areas of law. An obvious public law application is in the field of constitutional law. A few recent pieces of legal scholarship (here and here) have started to pave this path. The key contribution of corpus analysis here is in the temporal dimension we discussed yesterday. A central tenet of originalism is in what Larry Solum calls the "fixation" thesis—the notion that written constitutional language is designed to set fixed legal norms. Yet fixation implies determinacy. And the premise of "original public meaning" originalism (as opposed to old-school "original intent" originalism) is that what is fixed is not the individual intention of framers such as Madison and Hamilton, but the general public's understanding of the communicative content of the words of the Constitution. But if the relevant question for the interpretation of the constitutional text is the "original public meaning," as some argue that it is, then we would need access to evidence of historical linguistic norms and usage from the Founding era. Such evidence could be found in a corpus of naturally occurring language from that era.

We see plenty of other legal applications for corpus linguistics. Contract interpretation has many parallels with—and many sharp differences from—statutory interpretation. But the interpretation of a contract often involves determinations of ordinary meaning and the examination of evidence of custom and usage—circumstances in which corpus-based evidence could prove valuable. And we also see intellectual property applications—in patent claim construction and trademark genericness determinations.

We also see further hurdles to clear before we have a full-blown methodology of law and corpus linguistics. We address several of them in our article (including the question of the capacity of lay judges and lawyers to perform an "expert" linguistic function, and the question of whether the data we derive from the corpus can yield determinate answers to the difficult questions most often posed by ambiguities in the law). We think the visible hurdles can be overcome. Yet we also see a need for more careful scholarship in this field.

Further work is in order. But we are confident that lawyers and linguists can work together to develop a set of "best practices" and methods to refine an approach that now stands at its infancy. We think that is crucial. The resolution of ambiguities in legal language is one of the most important of all judicial tasks. It is also one of the most opaque (and fraught with logical error and the potential for bias). We must do what we can to minimize those problems. And the advent of corpus linguistic analysis is an important step in that direction.