Chief Justice Robots

What should it take for us to accept AIs as judges?

|The Volokh Conspiracy |

I have an unusually speculative article—more futurism than law as such—coming out in a few months in the Duke Law Journal, called Chief Justice Robots. I'd love to hear what people think. Here are the Introduction and the Conclusion; you can read the full article here:

Introduction

How might artificial intelligence change judging? IBM's Watson can beat the top Jeopardy players in answering English-language factual questions. The Watson Debater project is aimed at creating a program that can construct short persuasive arguments. What would happen if an AI program could write legal briefs and judicial opinions?

To be sure, AI legal analysis is in its infancy; prognoses for it must be highly uncertain. Maybe there will never be an AI program that can write a persuasive legal argument of any complexity.

But it may still be interesting to conduct thought experiments, in the tradition of Alan Turing's famous speculation about artificial intelligence, about what might happen if such a program could be written. Say a program passes a Turing test, meaning that it can converse in a way indistinguishable from a human. Perhaps it can then converse—or even present an extended persuasive argument—in a way indistinguishable from the sort of human we call a "lawyer," and then perhaps in a way indistinguishable from a judge.

In this Article, I discuss in more detail such thought experiments and introduce four principles—perhaps obvious to many readers, but likely controversial to some—that should guide our thinking on this subject:

[1.] Evaluate the Result, Not the Process. When we're asking whether something is intelligent enough to do a certain task, the question shouldn't be whether we recognize its reasoning processes as intelligent in some inherent sense. Rather, it should be whether the outcome of those processes provides what we need.

If an entity performs medical diagnoses reliably enough, it's intelligent enough to be a good diagnostician, whether it is a human being or a computer. We might call it "intelligent," or we might not. But, one way or the other, we should use it. Likewise, if an entity writes judicial opinions well enough—more, shortly, on what "well" means here—it's intelligent enough to be a good AI judge. (Mere handing down of decisions, I expect, would not be enough. To be credible, AI judges, even more than other judges, would have to offer explanatory opinions and not just bottom-line results.)

This, of course, is reminiscent of the observation at the heart of the Turing Test: if a computer can reliably imitate the responses of a human—the quintessential thinking creature, in our experience—in a way that other humans cannot tell it apart from a human, the computer can reasonably be said to "think." Whatever goes on under the hood, thinking is as thinking does.

The same should be true for judging. If a system reliably yields opinions that we view as sound, we should accept it, without insisting on some predetermined structure for the process. Such a change would likely require changes to the federal and state constitutions. But, if I am right, and if the technology passes the tests I describe, then such changes could indeed be made.

[2.] Compare the Results to Results Reached by Humans. The way to practically evaluate results is the Modified John Henry Test, a competition in which a computer program is arrayed against, say, ten average performers in some field—medical diagnosis, translation, or what have you. All the performers would then be asked to execute, say, ten different tasks—for instance, the translation of ten different passages.

Sometimes this performance can be measured objectively. Often, it can't be, so we would need a panel of, say, ten human judges who are known to be expert in the subject—for example, experienced doctors or fluent speakers of the two languages involved in a translation. Those judges should evaluate everyone's performance without knowing which participant is a computer and which is human.

If the computer performs at least as well as the average performer, then the computer passes the Modified John Henry Test.[1] We can call it "intelligent" enough in its field. Or, more to the point, we can say that it is an adequate substitute for humans.[2]

I label the test the Modified John Henry Test because of what I call the Ordinary Schlub Criterion. As I noted above, a computer doesn't have to match the best of the best; it just has to match the performance of the average person whom we are considering replacing.

Self-driving cars, to offer an analogy, do not have to be perfect to be useful—they just have to match the quality of ordinary drivers, and we ordinary drivers don't set that high a bar. Likewise, translation software just has to match the quality of the typical translator who would be hired in its stead.[3] Indeed, over time we can expect self-driving cars and translation software to keep improving as the technology advances; the humans' average, on the other hand, is not likely to improve, or at least to improve as fast. But even without such constant improvement, once machine workers are as good as the average human workers, they will generally be good enough for the job.

Indeed, in the John Henry story, Henry's challenge was practically pointless, though emotionally fulfilling. Even if John Henry hadn't laid down his hammer and died at the end, he would have just shown that a team of John Henrys would beat a team of steam drills. But precisely because John Henry was so unusually mighty, the railroad couldn't hire a team of workers like him. The railroad only needed something that was faster than the average team—or, more precisely, more cost effective than the average team.[4] Likewise for other technologies: to be superior, they merely need to beat the human average.

Now, in some contexts, the ordinary schlub may be not so schlubby. If you work for a large company with billions at stake in some deal, you might hire first-rate translators—expensive, but you can afford them. Before you replace those translators with computer programs, you would want to make sure that the program beats the average translator of the class that you hire. Likewise, prospective AI Supreme Court Justices should be measured against the quality of the average candidates for the job—generally experienced, respected appellate judges—rather than against the quality of the average candidate for state trial court.

Nonetheless, the principle is the same: the program needs to be better than the average of the relevant pool. It doesn't need to be perfect, because the humans it would replace aren't perfect. And because such a program is also likely to be much cheaper, quicker, and less subject to certain forms of bias, it promises to make the legal system not only more efficient but also fairer and more accessible to poor and middle-class litigants.

[3.] Use Persuasion as the Criterion for Comparison—for AI Judges as Well as for AI Brief-Writers. Of course, if there is a competition, we need to establish the criteria on which the competitors will be measured. Would we look at which judges' decisions are most rational? Wisest? Most compassionate?

I want to suggest a simple but encompassing criterion, at least for AI judges' judgment about law and about the application of law to fact: persuasion. This criterion is particularly apt when evaluating AI brief-writer lawyers. After all, when we hire a lawyer to write a brief, we want the lawyer to persuade—reasonableness, perceived wisdom, and appeals to compassion are effective only insofar as they persuade. But persuasion is also an apt criterion, I will argue, for those lawyers whom we call judges. (The test for evaluation of facts, though, whether by AI judges, AI judicial staff attorneys, or AI jurors, would be different; I discuss that in Part IV.)

If we can create an AI brief-writer that can persuade, we can create an AI judge that can (1) construct persuasive arguments that support the various possible results in the case, and then (2) choose from all those arguments the one that is most persuasive, and thus the result that can be most persuasively supported. And if the Henry Test evaluator panelists are persuaded by the argument for that result, that means they have concluded the result is correct. This connection between AI brief-writing and AI judging is likely the most controversial claim in the paper.

[4.] Promote AIs from First-Draft-Writers to Decisionmakers. My argument starts with projects that are less controversial than AI judges. I begin by talking about what should be a broadly accepted and early form of AI automation of the legal process: the use of AI interpreters to translate for non-English-speaking witnesses and parties. I then turn to AI brief-writing lawyers—software that is much harder to create, of course, but one that should likewise be broadly accepted, if it works.

From there, I argue that AI judicial staff attorneys that draft proposed opinions for judges to review—as well as AI magistrate judges that write reports and recommendations rather than making final decisions—would be as legitimate and useful as other AI lawyers (again, assuming they work). I also discuss AIs that could help in judicial fact-finding, rather than just law application.

And these AI judicial staff attorneys and magistrates offer the foundation for the next step, which I call the AI Promotion: If we find that, for instance, AI staff attorneys consistently write draft opinions that persuade judges to adopt them, then it would make sense to let the AI make the decision itself—indeed, that can avoid some of the problems stemming from the human prejudices of human judges. I also discuss the possible AI prejudices of AI judges, and how they can be combatted.

Just as we may promote associates to partners, or some magistrate judges to district judges, when we conclude that their judgment is trustworthy enough, so we may promote AIs from assistants to decisionmakers. I also elaborate on the AI Promotion as to jurors, and finally move on to the title of this Article: AI judges as law developers.

Indeed, the heart of my assertion in this Article is this: the problem of creating an AI judge that we can use for legal decisions is not materially more complicated than the problem of creating an AI brief-writer that we can use to make legal arguments. The AI brief-writer may practically be extremely hard to create. But if it is created, there should be little conceptual reason to balk at applying the same technology to AI judges within the guidelines set forth below. Instead, our focus should be on practical concerns, especially about possible hacking of the AI judge programs, and possible exploitation of unexpected glitches in those programs; I discuss that in some detail in Part V.C.3.

This, of course, is likely to be a counterintuitive argument, so I try to take it in steps, starting with the least controversial uses of AI: courtroom interpreters (Part I), brief-writing lawyers (Part II), law clerks (Part III), and fact-finding assistants that advise judges on evaluating the facts, much as law clerks do as to the law (Part IV). Then I shift from assistants to actual AI judges (Part V), possible AI jurors (part VI), and finally AI judges that develop the law rather than just applying it (Part VII); that is where I argue that it makes sense to actually give AIs decision-making authority. It would be a startling step, but, again assuming that the technology is adequate—and that we can avoid an intolerable level of security vulnerabilities—a sound one….

Conclusion

A man calls up his friend the engineer and says, "I have a fantastic idea—an engine that runs on water!" The engineer says, "That would be nice, but how would you build it?" "You're the engineer," the man says, "I'm the idea man."

I realize I may be the joke's "idea man," assuming away the design—even the feasibility of the design—of the hypothetical AI judge. Perhaps, as I mentioned up front, such an AI judge is simply impossible.

Or maybe the technology that will make it possible will so transfigure society that it will make the AI judge unnecessary or irrelevant. If, for instance, the path to the AI judge will first take us to Skynet, I doubt that John Connor will have much time to discuss AI judges—or that Skynet will have much need for them. Or maybe the technical developments that would allow AI judges will produce such vast social changes that they are beyond the speculation horizon, so that it is fruitless to guess about how we will feel about AI judges in such a radically altered world. And in any event, the heroes of the AI judge story will be the programmers, not the theorists analyzing whether Chief Justice Robots would be a good idea. [Footnote: As Sibelius supposedly said, no one has ever built a statue honoring a critic.]

Still, I hope that I have offered a way of thinking about AI judges, if we do want to think about them. My main argument has been that

  • We should focus on the quality of the proposed AI judge's product, not on the process that yields that product.
  • The quality should largely be measured using the metric of persuasiveness.
  • The normative question whether we ought to use AI judges should be seen as turning chiefly on the empirical question whether they reliably produce opinions that persuade the representatives that we have selected to evaluate those opinions.

If one day the programmers are ready with the software, we should be ready with a conceptual framework for evaluating that software.

[1] This doesn't require a unanimous judgment on the part of the panel; depending on how cautious we want to be, we might be satisfied with a majority judgment, a supermajority judgment, or some other decision rule.

[2] In some contexts, of course, automation may be better even if it's not as effective—for instance, it may be cheaper and thus more cost-effective. But if it's cheaper and at least as effective, then it would be pretty clearly superior.

[3] Carl Sagan observed that no computer program "is adequate for psychiatric use today [in 1975], but the same can be remarked about some human psychotherapists." The question is never whether a proposed computer solution is imperfect; it's whether it's good enough compared to the alternative.

[4]

It didn't matter if he won, if he lived, or if he'd run.

They changed the way his job was done. Labor costs were high.

That new machine was cheap as hell and only John would work as well,

So they left him laying where he fell the day John Henry died.

Drive-By Truckers, The Day John Henry Died (2004).

Editor's Note: We invite comments and request that they be civil and on-topic. We do not moderate or assume any responsibility for comments, which are owned by the readers who post them. Comments do not represent the views of Reason.com or Reason Foundation. We reserve the right to delete any comment for any reason at any time. Report abuses.

98 responses to “Chief Justice Robots

  1. A better Turing Test of the AI would be whether it reached results that were based on law and precedent, but were deeply immoral, which would give it away as written by the AI.

    1. Because, umm, only an AI could have reached the decision that Taney did? Or Buck, Korematsu, Plessy, Kelo…
      Humans did those F-ups completely on their own.

  2. I see two nearly insurmountable issues.

    The first: it will likely be impossible to prove that an AI-based system is free of prejudice of one sort or another. I might be able to tell, from a judge’s demeanour during a hearing, just what his prejudices are; how can these be detected without exhaustive research for an AI judge?

    The second issue is even worse. Present-day judges may live in a somewhat sheltered bubble, but they interact with their local circle of relatives, friends, and neighbours. They read the newspapers, go to movies, raise children, and experience the frictions of everyday life. They bring some compassion and understanding to the least glamorous levels of judging: everyday petty crime, small claims court, and the like. Robots will lack any such connection with human society, and human society will inevitably either reject these judges and the justice system they administer, or come to resent them even more than they resent remote human overseers in the nation’s capital.

    1. “it will likely be impossible to prove that an AI-based system is free of prejudice of one sort or another”

      We can’t prove that a human-based system is free of prejudice either.

      “I might be able to tell, from a judge’s demeanor during a hearing, just what his prejudices are”

      I just don’t think this is true. Maybe sometimes with respect to some issues. But really, not a very complete picture. If anything, we would have a much better ability to understand the prejudices of an AI system (depending on how it was designed) because ultimately any AI system is going to be designed by humans with properties that emerge from the code written by humans and the input data provided by humans. At the end of the day, the prejudices of an AI system would be more clear and more addressable than prejudice by humans.

      By the way, the problem of prejudice in AI is a real one. Consider this article: https:// http://www.theverge.com/2016/3/24/112…..bot-racist

      Note, that the reason that the AI had prejudice was because that was the data that humans fed into it. So, this is a problem that we can address. And more so than with humans.

      1. (cont.)

        “The second issue is even worse. Present-day judges may live in a somewhat sheltered bubble, but…”

        So, you seem to be excusing some of the prejudices that judges bring to the table (as a result of living a sheltered life), but seem to suggest that this is all outweighed by the fact that judges typically interact with relatives, friends, and neighbors, read newspapers, go to movies, raise children, and experience “frictions” in everyday life.

        Then you go on and suggest that this real world experience “being human” will cause them to bring “compassion and understanding” to their job that more than makes up for their generally privileged existence.

        But, that is only true some of the time. Some judges take pleasure in the opposite of compassion. They berate lawyers and people who show up to court. They are bullies in a robe. “Being human” does not consistently result in “compassion and understanding.”

        One possible advantage of AI judges is that we could use them to replace those judges who end up as bullies or as corrupt. (Recall the judge in Pennsylvania who took kickbacks in return for sentencing more kids to confinement and for longer periods.) Further, compassion is not always an unqualified good. It can sometimes be a form of bias.

        1. “One possible advantage of AI judges is that we could use them to replace those judges who end up as bullies or as corrupt. ”

          Except… somebody has to manage and maintain your AI systems. If there’s a kickback available, it doesn’t much help if they go to the IT staff instead of the judge.

          For that matter, do the IT staff who manage the AI judges now have to be confirmed by the Senate? Because if they can be hired as a contractor by whoever is in charge of the political machinery at the time, there’s a strong reason to believe that they are not impartial. And if an election changes who is running the government, and they can order updates to all the judges at once…

      2. “We can’t prove that a human-based system is free of prejudice either.” Moreover, we know that it in fact is not.

    2. Well, we already know that the current human-based system is not free of prejudice of a wide variety of sorts. So to the “John Henry Test”, the AI-based system doesn’t have to be “free of prejudice”, it just has to be more free of prejudice then the current system, which is not prejudice free.

      To the second, it’s that same “compassion” that drives judges to give rapists a slap on the wrist because they don’t want to “ruin” a young man’s life and similar such rot. So it’s not like humans being compassionate is always so grand.

      That said, if your laws only work when they’re applied by “compassionate” judges, then that sounds like your law is bad and should be changed.

      1. “That said, if your laws only work when they’re applied by ‘compassionate’ judges, then that sounds like your law is bad and should be changed.”

        Laws are written in advance. Sometimes, conditions arise where actions that are not wrongful nevertheless are a violation of law… this is inevitable because human beings are not prescient. To explain what I mean, consider a state that has a medical marijuana law which allows people with pre-approval to have the MJ, but still criminalizes possession by those without. A gives a ride to B, who has MJ authorization. B is clumsy, however, and some of his lawfully-possessed weed is accidentally left in the car when B leaves. A is now in violation of the MJ possession laws. You might even have a case where B takes a bit along as A+B go to the beach to relax. A and B both get out of the car, and the weed left in the car is constructively possessed by A, because it is A’s car.
        Now, ideally, this situation is fixed at the law-enforcement level, but if they’re on a “get tough on drugs” kick, they won’t be, and the case may get into the legal system. Again, ideally, that would be “No-Bill”ed out of the grand jury, but… you know, ham sandwich… so it might even get to a trial. Will the AI scan the summary and dismiss the case?

        1. Will the AI scan the summary and dismiss the case?

          Will a human?

          If you can’t guarantee that, why should anyone have to guarantee the same for an AI?

          That said, you do know that’s what juries are for, yes?

          1. Juries are not for dismissing cases. That’s a step that happens before the jury.

            Regardless, James is not saying that you have to guarantee that the AI will show proper judgement all the time. As the original article points out, the standard is not perfection.

            James’ point, however, is that while humans might not, AIs cannot ever exercise the discretion necessary. And the discretion is necessary because laws are written by all-too-fallible humans. While it’s nice to say that “if your laws only work when they’re applied by ‘compassionate’ judges, then that sounds like your law is bad and should be changed”, that’s an impossible standard. The poor quality of legislative output is a given – an input that can not be ignored or assumed away when designing a judicial AI system.

          2. “That said, you do know that’s what juries are for, yes?”

            Yes… Grand juries, not petit ones.

  3. We already have something pretty close to AI judges in the form of algorithms that determine people’s likelihood of reoffending and therefore bail requirements. These algorithms are black boxes whose conclusions can’t be adequately analyzed because their code is the proprietary intellectual property of the firms that create them.

    As the role of AI in the judiciary expands, I expect it will follow this track, rather than replacing lawyers and/or judges. That is, the AI will purport to offer objective analysis of evidence and present its recommendations to the human judge. The human judge, believing that the AI’s analysis reflects an objective truth (as opposed to the subjective judgements of its creators), will go along with the AI’s analysis most or possibly even all of the time (this is what typically happens now with the current algorithms).

    Since the AI’s analysis will be construed as “findings of fact” rather than “matters of law”, your options for appealing its reasoning will be limited.

    And what happens when RoboJudgeCo becomes party to a case after their AI becomes ubiquitous?

    1. Those algorithms actually demonstrate the problem I related below: When written to do the best job of predicting likelihood of re-offending, they produced accurate results that weren’t permissible. (They were recommending higher bail for black offenders, basically.)

      They were noticing patterns in the data that people aren’t allowed to notice. And writing them to not do this was a bit problematic, because, how do you program the computer to take into account factors you’re not allowed to admit you’re taking into account?

      1. “how do you program the computer to take into account factors you’re not allowed to admit you’re taking into account”

        Here is on approach. Don’t feed the data about race as an input into the program, either directly or indirectly (using, for example, zip code).

        In general, if you don’t want a certain type of data to influence the algorithm, then don’t train the algorithm using that data. The algorithm (assuming it is well-designed) will come up with the best predictions it can, given the data that is available.

        1. That doesn’t solve the problem. To Brett’s specific example above, the recidivism algorithms specifically were not given race data as part of their inputs. They nevertheless produced race-based outcomes because race is cross-correlated with so many other factors.

          1. I think that’s a misleading way to describe what was going on: The algorithms were producing outcomes which were NOT race based, but objectively based on the data they were given. Which outcomes only correlated with race because the underlying factors they were nominally supposed to take into account were correlated with race.

            It’s the human recidivism estimates which were based on race, to avoid acknowledging the underlying correlation.

            They wanted an algorithm which would replicate the outcome of a racially biased process, without taking into account race. But they couldn’t admit that they wanted a racially biased process, because they were officially committed to pretending that they weren’t racially discriminating.

        2. But you’ve got that backwards. The problem is that the people ARE taking into account the factor, but not admitting they are. So the algorithm won’t reproduce what the people are doing unless told to do what the people are doing, but deny doing.

          Take, for example, college admissions. Many universities have de facto racial quotas in place, implemented by attributing admission decisions to fuzzy “holistic factors”.

          An AI programed to handle admissions decisions would NOT replicate the choices of admissions offices staffed by humans unless provided the racial data, or something it could be inferred from, and specifically programed to implement the quota the university is imposing, but denying imposing.

          I’m pointing out that it’s hard to program an AI to do something you’re lying about not doing.

          1. “I’m pointing out that it’s hard to program an AI to do something you’re lying about not doing.”

            That’s an interesting theory, and might be true in an open-source context. But we have LOTS of cases of computers doing things other than what the people were told the computers were doing… sometimes on purpose (we said we wouldn’t retain your personally-identifying data or track minors) and sometimes not on purpose (we said we’d only use your payment-processing information for legitimate purchases made in our stores, but bad guys managed to insert malware into our point-of-sale terminals and we leaked everyone’s payment-processing information to the bad guys.)

            1. Yeah, if the judge software were a black box written by staff sworn to secrecy, no doubt you could conceal things like skewing verdicts to make the racial statistics look better. Or Easter Eggs designed to let specific people off the hook.

              But how much chance is there that black box judge software written in secret could be widely adopted? If ever there was a case where transparency was demanded, it would be this.

              1. “But how much chance is there that black box judge software written in secret could be widely adopted?”

                Close to 100%. “Elections have consequences.”

                1. You really think a momentary majority could implement AI judges? I think it would likely require a constitutional amendment, and any attempt to go that route with less than overwhelming support and transparency would be the stuff of revolution.

                  Because without transparency, it’s just too obviously a poorly disguised packing scheme.

                  1. “any attempt to go that route with less than overwhelming support and transparency would be the stuff of revolution.”

                    Just like there’s rioting in the street and revolution in the air when one party picks a judge based clearly on which way they expect the judge to lean. Right.

        3. Multivariate regression analysis can determine whether factors are predictive independently, or are merely co-dependent.
          Problem is, no one really wants to know the answer, is race predictive of X, because it is just too toxic to have as an unassailable fact.

          1. “Problem is, no one really wants to know the answer, is race predictive of X, because it is just too toxic to have as an unassailable fact.”

            Nah. The people who care are ABSOLUTELY CERTAIN what the answer is, already. And the other 97% of the population understands that individuals can be exceptions from rules. There are a few solid correlations… being black indicates a higher likelihood of sickle-cell disease, and being white indicates a higher likelihood of sunburn and skin cancer, particularly in southern California.

    2. “That is, the AI will purport to offer objective analysis of evidence and present its recommendations to the human judge”

      I think you’ll wind up with expert A from the plaintiff (or the people) reading off what their AI says, and expert B from the defendant reading off what their AI says, and then the human judge applying judgment with this advice in hand. Some judges will favor plaintiff’s AI, and some will favor defense AI, and if either side finds its advice being consistently ignored there will be tweaks to the knowledge base until the balance returns.

  4. How about the Dread Pirate Robots?

    1. That’s the Trojan horse app you installed that’s actually mining for bit-coins when you’re not playing your silly “match-3” game.

    2. Inconceivable

  5. I do not believe that any of what you describe will be impossible or even necessarily very difficult Artificial intelligence is advancing by leaps and bounds (though largely outside the realm of the academic researchers claiming that term). I’ll go further than your ‘ordinary schlub’ test and say that automation becomes viable as soon as it can cost-effectively replace the worst humans doing the task. Self-driving cars don’t even have to be better than average drivers to make sense – it’s enough that they are better than the human drunk drivers.

    My fear is in the long-term consequences of the hypothetical judicial automation system’s approach to precedent. Consider a system trained by expert humans during the Korematsu era. Since we expect the robot lawyers and judges to match the decisions of human lawyers and judges, the system will be tuned to reach that same result. Decades later, human judges and lawyers can look at that decision and realize that it was deeply immoral and wrong even though no objective facts have changed. Machines, however, have no way to self-update their decision algorithms. In the context of avoiding bias, that’s an advantage – it leads to greater consistency. The unintended result is that the biases of the present get locked in and bad precedents like Korematsu perpetuated.

    1. (con’t)
      And lest you argue that the robotic system would be subject to continued training and tuning, I ask what data will be used for that tuning. Once robots can cost-effectively replace the entry level lawyers (as you point out above, a low standard), how do new lawyers get trained? Once robots can replace average experienced lawyers, where do the best lawyers come from? There will eventually be few and then no human comparitor decisions. The alternative is to evaluate based on general opinions of the outcomes (such as ‘the public at large thinks Korematsu was immoral’), but that approach quickly devolves into a popularity contest – an outcome that judicial systems are designed to avoid.

      1. I can’t help but wonder how much of this already applies to human lawyers and judges. As a layperson, I’ll often read about court decisions and ask myself “how the hell did they get *that* result?”

        From outside the legal profession, it seems as if we might as well be using impenetrable AI systems.

        1. ” As a layperson, I’ll often read about court decisions and ask myself ‘how the hell did they get *that* result?'”

          There are two ways where this happens. One, a law is written with a set of assumptions. Then, the underlying circumstances change to make one of the assumptions false. The law is then applied as written, because the legislature is supposed to fix that, not the courts.
          The other way is when something truly is new, and, looking for precedents, there isn’t anything that is really close, so, rather than find a (new) answer, they shoehorn the new whatever into a precedent that isn’t really applicable… but is the closest they can find. (Often, misunderstanding the “whatever” is a part of this… with the misunderstanding ranging just the judge, all the way to the judge, both parties and their lawyers, and the various professional “experts”.)

      2. Except part of the “continued training and tuning” would be legislators passing new laws in response to the robot’s decision.

        So sure, since the robots would be making judgements based on the actual laws rather then twisted contortions there-of, legislators would have to do a better job of writing laws such that the real results match the intended results better.

        Throw in governors and presidents commuting or pardoning as another input to the robots, and they have another point of “continued training and tuning”.

      3. “And lest you argue that the robotic system would be subject to continued training and tuning, I ask what data will be used for that tuning.”

        You can overrule a bad decision (and its precedential effects) by amending the Constitution. Scot v. Sandford? Wiped away by the 14th amendment.

  6. While I largely agree with you there are a number of reasons that functioning as a judge is much harder than writing persuasive briefs.

    To write a persuasive brief you merely need to be good at doing something. However, as a judge every lawyer coming before you is looking for a tactic or scheme to gain an advantage and they will intelligently attack any points of weakness.

    While I don’t have much faith in human reasoning to be great at abstract mathematical, moral or scientific reasoning the one thing evolution surely has done is given us very robust defenses against being taken advantage of by other people.

    I expect it will be much easier to create an AI that is as good at even a general skill like creating persuasive arguments than it will be to ensure the AI has no weaknesses human lawyers can exploit (ohh look since the AI is ruling there are no page limits and we can just submit a million pages and count on rounding errors in accumulating evidence to push up our chances of winning).

    Second, judges in some sense are expected to act as the last line of defense when something totally unexpected happens in society and a decision needs to be made and that means they need to be competent to understand all sorts of weird human tendencies/behaviors (sex, fetishes, weight gain etc… and not merely from a statistics POV)

    1. I was about to echo your first sentence but on different grounds.

      Creating a persuasive argument, maybe even the best possible argument, is qualitatively different than evaluating whether that best argument is actually a good argument.

      For an AI lawyer to be ready to become an AI judge it needs to be able to create that argument but then give its client good advice about whether to pursue it.

      1. “For an AI lawyer to be ready to become an AI judge it needs to be able to create that argument but then give its client good advice about whether to pursue it.”

        In the legal context, “Good” advice means accurately predicting what the judge’s (and the jury’s) opinion will be.

        1. If it can create its own argument it can also anticipate the opponent’s. Accurately assessing the likelihood of one or the other being the more convincing is a harder problem.

          1. Human beings don’t do that well, either. Every trial consists of at least one person who’s dead wrong about how the case is going to turn out. Lots of punditry exists with “experts” laying out what the Supreme Court is going to rule on this or that important case… some people have better betting records than others, but most of those cases don’t come out 9-0, either.

    2. The exploitability is a massive problem with the proposal. Imagine Watson playing Jeopardy, except the game is changed and the contestants are charged with giving questions that the computer will answer incorrectly, and whoever’s questions Watson is worse at wins. People would quickly figure out what kinds of things Watson was likely to make mistakes on and how to exploit the way the AI is built. Same thing would happen with AI judges. Winning parties would be the ones who exploited the programming the most.

      1. What will happen is that a known way of “improperly” influencing the judge-bot will be discovered and exploited a few times. A fix will be developed… but won’t be applied because we cut the budget for the courts once the judgebots replaced the expensive Article III judges whose pay can’t be reduced… and that meant that the IT staff didn’t have time to get around to all the courtroom server stacks.

  7. “We should focus on the quality of the proposed AI judge’s product, not on the process that yields that product.”

    WHOA!

    It seems to me that judicial law is mostly (mainly?) about process and how judges arrive at decisions is just as important as the decision itself.

    1. Yes but that process is external to the judges mind. EV is talking about focusing on whether the judge denies/grants the right motions and writes the write decisions based on established process not how their ‘brain’ works.

      Also many of the procedural rules are about minimizing the unfairness created by having a bunch of different humans rule on different cases.

      A AI judge can be copied and the same judge can be used on every case and won’t need to recuse themselves. They can literally delete statements that are more prejudicial than probativr from their memory. So they do better in many regards here.

      1. I actually meant the process internal to a judge’s mind – not the external procedures (which mostly can be measured by some Yes/No standard).

        In Friday’s Round Up, there was a case about a cop shooting an unarmed person in the back, and the court(s) had to decide competing witnesses’ credibility.

        Additionally, the court(s) had to address: “heat of passion,” “adequate provocation,” the meaning of “Provocation is adequate when it would arouse a reasonable and ordinary person to kill someone,” and the “adequacy of the provocation, ‘by the standards of a reasonable officer.'”

        I can’t see how a computer can determine credibility, heat of passion, adequate, or reasonability.

  8. Actually I’d also add that a vital function of judges is to give people the impression that their concerns have been heard and someone was willing to listen to them sufficiently long to understand their position and rule on it.

    Judges don’t always do this but since they are a high status human we are willing to credit them (prob more than they deserve) on this front. However, no matter how competent AI judges are I worry people will see them just as mere machines and always feel that their argument still wasn’t heard and if only a person would listen.

    Also, weirdly, we seem more willing to credit AIs with unfairness than people. Perhaps bc people fight back when you call them racist or bc their thought patterns sound like ours but it seems true all the same.

  9. I see a basic problem here, that you can already see in other areas of artificial intelligence research, but which is likely to be particularly serious in law: The task you’re trying to program the AI to do isn’t the same as the task you expect it to perform.

    You are, nominally, trying to program the AI to apply the law and precedent. But you’re expecting the AI to reproduce the behavior of judges. This is going to require explicitly telling the AI to not actually apply the law and precedent. Because if the AI relentlessly applies the law, as written, and the precedents, as recorded, it isn’t going to reproduce the expected behavior of judges. You can just about guarantee that.

    They’ve run into this problem repeatedly in AI, where the programmers are told to make an AI which will implement some task, and it does so, but too accurately, failing to reproduce the unspoken biases of the people whose activities it is to duplicate. Because they’re unspoken, can’t be explicitly acknowledged, and that makes it really hard to write into the software.

    “Pick the best applicants for the job, but the ‘best applicants’ have to have the right gender and racial qualifications. Oh, and it would be illegal to write a quota into the program… so make sure the program doesn’t explicitly take race into account.”

  10. Professor Volokh,

    I think (1) is a critical problem. In many endeavors of life, there is a generally-agreed desired outcome. Litigation is not one of them. By definition, one party wants one outcome and the other party(ies) want something else. So by this criterion, every robot judge will alway satisfy 50% of the litigants. A coin flip would be just as good.

    This is why, in law, the reverse of the process you propose is used: instead of looking at the outcome and seeing if we like it, we instead look at the process of reasoning to reach that outcome and determine if it meets some sort of legitimacy criterion. If we consider the process legitimate, we accept the outcome even if we don’t like it.

    The process is complex, and virtually all judges introduce elements of innovation at least at the margins.

    1. John Henry’s success criteria were straightforward, length of tunnel drilled in a fixed time (with certain other criteria including width/height and structural soundness).

      But there are no such straightforward, easily and objectively measurable success criteria for many if not most human endeavors, including law.

  11. Also, an engine that runs on water would have a bit of a prior art problem. Water mills, hydroelectric dams, etc.

  12. JFC. Even considering this is crazy.

    Replacing humans doing dangerous or tedious work with machines is one thing. Turning a vital function to them is madness.

    1. A+ on the post title however.

  13. “A common mistake that people make when trying to design something completely foolproof is to underestimate the ingenuity of complete fools.” –Douglas Adams

    This is an important thought experiment. And we had best wrap our minds around the problem because some fool is going to try to make an AI lawyer or judge.

    1. And as soon as someone actually manages to make something foolproof, the universe will invent a better fool.

  14. Interesting. It might work if the AI was only programmed with the text of the constitution, a plain English dictionary, and a grammar textbook. Turn the crank and see what happens. If garbalygook comes out, a constitutional amendment is needed.

    1. Seeing as even “originalist” judges don’t reason that way, I’m pretty sure that’s not a fair test.

      1. “Seeing as even “originalist” judges don’t reason that way”.

        See textualism.

        1. “See textualism.”

          See “actual jurisprudence”

      2. It would be the only way to trust an AI to be fair about it. Otherwise the AI would be influenced by the selection of whatever else whoever is programming it chooses to add.

        What else would you use to program it?

        1. Well, you’re going to need a WHOLE LOT of process logic.
          You can encode the text of the Constitution, the entire US Code, AND all the reported cases going back to Founding, quite easily. The technology to do this was well understood back in the 60’s (although a bit costlier for storage back then.) But to actually DO anything with all that stored information, the computer needs to understand what it is you want done. Citation checking, for example, is fairly simple, as is text searching. So, for example, if you want to know if there are any cases involving crabcakes in Maryland, a computer can help you out. But sorting out what those cases MEAN is a much more involved process.

        2. What else would you use to program it?
          Case law.

          1. You could propose one big constitutional amendment that includes all this case law you are speaking of and go from there.

  15. One of the best rebuttals to the value of AI is to be found in the work of Joseph Wood Krutch. I don’t know that he ever directly addressed AI, but consider this (Wiki) description of his book, The Measure of Man, where he “argued that there are aspects of human beings, such as reason, consciousness, free will, and moral judgment, that cannot be explained by mechanistic, deterministic science.”

    In that book, he stressed that even where human behavior can be described and predicted by scientific theories most of the time, human free will is important even if it is, to quote Shakespeare, “More honour’d in the breach than the observance.”

    Current law operates in robotic or algorithmic ways. A judge is bound to follow precedent, even if thought flawed and problematic, for example. But then we see those splintered decisions and instances of judges running from bad precedent, even where they don’t really have the authority. Of course it’s not meaningful simply that there are aberrations (and even a machine could offer superficial aberrations). But the reasons for them are deeply important. We would do a great disservice buying into AI verisimilitude. That we are sufficiently predictable and flawed to be convinced by a machine (or by people) doesn’t mean a machine actually is as good. And we lose something crucial.

  16. After 45 years working with other programmers, my considered opinion is that this is a really bad idea.
    Sounds like the courts would become the legal equivalent of the social account deleting algorithms.

    1. As a programmer of almost 20 years – agreed!

  17. David Gerrold, in, if I recall correctly, one of his Chthorr novels, described the litigation storm involving “Baby Bill’s Dollar”. A bequest of one dollar had been willed to one young Bill. It became the topic of a dispute, and litigation ensued.
    Because of the existence of AI lawyer-bots, litigation was cheap, and endless numbers of lawyer-bots could be dispatched to do battle with other lawyer-bots. After some number of years, the legal fees had reached into the billions of dollars.

    A number of the comments in this thread have been along the lines of “I don’t see how…”
    Well, lots of people have responded that way to new possibilities, and yet someone has found a way to make things happen. So in the spirit of Clarke’s Second Law, let’s allow for the possibility this is one of those times.

    Stipulating that lawyer-bots capable of doing what the average lawyer does become available, eventually they will be widely available, and cheap. It’s one thing when your use of a lawyer-bot is to ask about the legality of some action. It’s another when your law-bot can file suit against other people. Those other people will then have to respond, either in person, through their own lawyers, or through law-bots of their own. Is there an equilibrium point short of litigation consuming all available computing power, and technological civilization grinding to a halt?

    1. “Because of the existence of AI lawyer-bots, litigation was cheap, and endless numbers of lawyer-bots could be dispatched to do battle with other lawyer-bots. After some number of years, the legal fees had reached into the billions of dollars.”

      I’m seeing a bit of an internal contradiction within this scenario.

      1. It depends on the precise shape of the demand curve. It’s not uncommon for a drop in price to lead to an increase in demand that results in more money spent, in aggregate.

        If a lawyer’s time costs $100 per hour, some number of hours will be demanded.
        If a lawyer-bot’s time costs 1? per hour, a lot more hours will be demanded.
        A single individual is likely to think twice about paying for even a single hour at $100, but might regard the purchase of 100 hours per day as a trivial expense. This trade off alone represents at least a 265% increase in the cost of litigation in general. And suddenly, it’s worth spending another $1 or 2 per day on lawyer-bots to defend against others’ lawyer-bots (rather the way computer virus software is now considered a reasonable expense).

        1. I don’t think any even vaguely rational programmer is going to design a system where the auto-lawyer racks up more than $1 of expenses litigating over $1. Or, rather, some would, but the market would favor lawyer bots that know when to quit.

          1. But entirely rational decisionmakers (both traditional and programmers) already work within systems where exactly that problem occurs on a regular basis. The problem is sunk costs. Consider:

            Possible suit with a $1 payoff. Initial estimate is that it will cost 50? to win. Your choice is zero (no suit) or 50? profit (sue). Net value is positive so pursue the case.
            A week later, you’ve already spent the 50? but you re-run the analysis and realize it will cost an additional 50? to win. Still the same payoff. Do you continue the case? The money already spent is lost. Your choice now is -50? (stop the suit) or $0 (continue). Value of continuing is better than stopping so you continue.
            Repeat step 2 until you reach $1 billion.

            1. That’s where stop-loss features come into play.

              1. Stop-loss leads to an economically irrational decision.

                Say your stop-loss threshold is $5. In step 10, you’ve already spend $4.50. You re-analyze the case and conclude it will cost an additional 50? to win the case and you still have the same $1 payoff. Your choices now are -$4.50 (stop the suit) or $-4.00 (continue).

                While both those outcomes are worse than if you’d never started, time travel is not available in this hypothetical. The rational choice is to minimize your losses by again continuing the suit. (Though it would probably also be rational to fire whoever told you the previous nine times that it would only cost 50? to win the case.)

    2. Is there an equilibrium point short of litigation consuming all available computing power, and technological civilization grinding to a halt?
      Sure.

      Juries.

      Simply put, if both the judge and lawyers are replaced with robots, then we need juries for all such legal action, even ones that do not currently use them.

      So while the robots can operate at Mach-whatever, they still have to wait for the jury’s verdict before they proceed. And if too many people find too much of their time being consumed in the jury box, then legislators will respond and put limits to such nuisance lawsuits.

  18. Caveat: I’m not an attorney, lean libertarian.

    Eugene – I think it’s great and important that someone with your peculiar mix of talents contribute to the thinking on this subject. I have only read the post here and not the full paper, but have the following thoughts.

    “Evaluate the Result, Not the Process,” is an engineering way of looking at this issue, not a human-centered one, and I’d like to respectfully push back on your conclusion. I think for participants in the legal system to be comfortable with an AI decision-maker, it will require that the *process* be trusted, since one party at the conclusion of the legal process will not be happy with the result (by definition, in an adversarial system). Ideally this would be accomplished with perfect transparency into the process that is followed, and with an AI involved, it would seem to be an easy effort to make available all of the logic and data that was processed in reaching the conclusion. (1 of 3)

    1. During the development of judicial AIs, I agree that the legal engineers will have to run parallel process to become comfortable with the result, but it is really the *process* that they are measuring by doing so (e.g., “Compare the Results to Results Reached by Humans”). I also think that civil libertarians need to focus entirely on the *process* rather than the result in order to be sure the process protects human rights. Humans can be satisfied with a fair process that generates a flawed result, but will have little trust for a flawed process that generates a positive result (witness society’s willingness to accept democracy vs. a benevolent autocracy). Our current criminal system is (in theory) designed to let some actual criminals go free in order to prevent the violation of civil rights or to wrongly punish an innocent person. In the current US legal system, I think it is the process that is revered, *not* the outcome. If you argue the opposite in this post, I’m sorry I missed that point. (2 of 3)

      1. “[M]atch the performance of the average person whom we are considering replacing.” I think this is exactly the right message and I would lead with this. Even skeptics would have to agree that the lowest common denominator in the legal process is a very low bar. I know you are being provocative to use “Chief Justice Robots” as your title, but I think leading with this is exactly how the skeptics will slow the adoption down. Just as you describe the promotion of AI from roles as assistants/tools toward eventual decision-makers, I believe the adoption in the court system would run upstream, but starting with judicial AIs replacing arbitrators and mediators and mock juries in commercial and private disputes (an open-source AAA?). In these forums, the inclusion of logic like Monte Carlo analysis could value uncertainties in a much more clever way than the winner-take-all stakes often involved in a criminal matter. I think including humans as the final decision-maker in the appellate courts and above will be the the norm indefinitely. Although I completely agree with your progression, I think putting the Chief Justice Robots out there might focus too much attention on a futuristic end-state that will be decades (or longer) in the making.

        Ttfn (3 of 3)

  19. For a different picture of future courts see the 1961 science fiction novel (not the short story) Monument by Lloyd Biggle. This was written before current AI concepts, he imagined a court where the lawyers “play” their case by providing citations in favor of their argument, with the computer “judge” determining the validity of the citations and their relative weight. A desperate lawyer would use citations which had been overturned on the hope the judge wouldn’t notice.

  20. As with any other form of advanced automation, AI in the legal field ought not be conceived as a set-it-and-forget-it operation. It will be necessary to continually monitor the behavior of AI algorithms to ensure they do not wander off-track ? at the beginning, we must recognize that this is more than possible, it is probable.

    How will we evaluate “off-track?” Probably through continuing review by human experts similar to those who gave the initial stamp of approval.

    Unintended consequence: over the course of time, the pool of human experts will shrink as AI takes on more of the work and they have less opportunity to gain experience and to think deeply. So the continuing review will end up being conducted by competing AI bots; we depend on a thriving market for alternatives.

  21. The weakness of generating results in accordance with the law is that they will often be unjust.

    1. Whereas generating results not in accordance with the law will often be unjust, too. Just in different ways.

  22. Skipping beyond the clerical errors on pages 31, 38, 48, and 49 of the draft — and generally agreeing with the material up to that point — footnotes 149 and 152 contain what is for me the most important caveat: “Even if AI judges don’t respond enough to changing public attitudes and changing social conditions, the human evaluators who decide which AI judge to select would be able to so respond” provided that the AI judges are “chosen by evaluators who are more in step with current attitudes.”

    This makes the definition, selection, and tenure of the “evaluators” critical. Currently, judges are not selected by “a blue-ribbon panel of trained observers” and, absent impeachment, serve in office only as allowed by both nature and human forces (that is, God can kill a judge, but so can a sole assassin or an angry mob). Let’s assume an AI judge could be impeached through existing processes.

    The fact that a judge — AI or human — is a “good judge” is oddly irrelevant: as a collective, we want the judges we want at the moment and for an arbitrarily limited time thereafter. A human judge and his opinion are ephemeral: how can an AI judge and its opinion, both dependent upon the AI evaluator(s), be made equally ephemeral?

  23. Why does there have to be more than one AI entity if there is a presumed “good” outcome?

    Why do you want 2 lawyer bots trying to persuade a third one? Why isn’t it one perfect program doing everything?

    Is it because we want varied results based on the competency and biases of the programmers?

  24. We already have “computer aided practice of law”, thanks to our friends at Thomsen/West.

    I don’t see AI judges on the near horizon. AI paralegals, maybe. Just like the ordinary human kind, they’ll still have a human being, with legal education, who signs all the paperwork, though.

    The first advantages come in scheduling. The boss is in a depo, and it runs longer than expected, so the AI paralegal looks at the rest of the day’s scheduling, and starts to move things around. An ordinary, non-AI computer can do some of this, already, but only with other computers that share a common setup. Put the AI on the job and it can start making phone calls to coordinate with actual human beings instead of their Outlook calendar.
    Then you’ll get proofreading and flagging of legal documents, and actual recomposition. Discover that the page limit is too short? The AI will compose the motion to allow overlong documents, and juggle the document(s) to fit under the limit (just in case) as well as write the longer, more complete document(s).

    Eventually, you’ll get the Star Trek interface, where you just tell the computer what you want instead of having to type everything out, and it’ll actually guess fairly accurately what you meant to say rather than what you actually said, and use that instead of the mistake you actually told it to do. Until you get at least that level of AI, everything still requires a human being’s authorization for the computer to do it.

  25. I was asked to pass along the following comment by the Sweet Meteor O’Death (@smod4real):

    “I’m perfectly comfortable turning the judiciary over to Cyberdyne Systems.”

  26. We may or may not have someday, robotic justices..

    Why do we need a ‘chief justice’ at all? Someone has to administer the docket: Beyond that the Chief doesn’t do anything the rest of the justices don’t.

    1. US Constitution, Article 1 Section 3:

      The Senate shall have the sole Power to try all Impeachments. When sitting for
      that Purpose, they shall be on Oath or Affirmation. When the President of the
      United States is tried, the Chief Justice shall preside: And no Person shall be
      convicted without the Concurrence of two thirds of the Members present.

      1. Interesting.

        That’s the ONLY place in the Constitution that mentions the Chief Justice. Article III certainly doesn’t.

        Still, seems like the position itself is quite supernumerary.

  27. We can already write an algorithm to replace Justice Ginsberg no AI required. I’ll do it right now!

    //GINSBORG.cc v1.0

    int main(case, decisionOptions)
    {
    decisionCurrent=decisionOptions[0];

    if (case.political==TRUE)
    {
    for (i=0; i decisionCurrent.leftistScore;
    decisionCurrent = decisionOptions[i];
    }
    }

    else
    decisionCurrent = decisionOptions[ randomNum() ];

    decisionFinal = decisionCurrent;

    return decisionFinal;
    }

    1. Alright you guys have the weirdest most broken commenting system in the world…heres the code

      https://pastebin.com/0FQUqptn

      1. The comment system intentionally has filters in place to keep people from putting code into the comments.

        I think that’s a Good Thing.

        1. Why? Its not like you can script from here and launch a virus AFAIK. Its just text.

  28. This reminds me of the paper “The British Nationality Act as a Logic Program”, by Sergot, Sadri, Kowalski, Kriwaczek, Hammond and Cory, Communications of the ACM, May 1986. Their argument was that the act in question could be expressed as a set of logic statements, allowing purely mechanical decisionmaking. They expressed the hope that all laws could eventually be reduced to this form.

    They made an interesting case, but followup letters were vigorously opposed, and, in my opinion, pretty well demolished their argument. One theme was that, despite appearances, the Act contained multiple points where human judgement and interpretation were necessary.

    This is a little different from the argument here, which is essentially that judgement and interpretation can, thirty years later, also be mechanized. But my suspicion is that we’re a long ways from that, just as we’re a long ways from getting legislatures to write laws in Prolog.

    1. Awesome! When first reading the post, I actually pulled out my printed Vol 9 No 1 (March 1984) copy of ACM Transactions on Database Systems to review the wonderful “A Database Management System for the Federal Courts”. Time changes everything.

  29. The AI is only as good as the initial programming and the training. Garbage-in Garbage out (GIGO). So far, AIs do not have a great record in criminal justice. AI is used in some localities to predict risk of recidivism and provide guidance for sentencing. So far, some have been shown to over-predict recidivism in black populations and under-predict recidivism in white populations. That’s because questions like “did your father do time” etc. are used. That’s a correlation with recidivism due to other factors, but not actually predictive. Rather unfair to visit the sins of the fathers on the sons.

  30. First we need a rational legal system.

  31. Any AI is at the mercy of its algorithms. How does the programmer tell the AI judge how to interpret the laws it is given? Doe it apply originalism, textualism, living constitution, etc.? The algorithm HAS to be given a methodology for applying the tenets laid out in the constitution and associated laws. Once it is decided which methodology is to be used – how does the AI become any different than its human counterparts? AI judges programmed with originalist interpretation techniques will provide rulings more consistent with those who want such rulings, and those programmed with other techniques will reflect theirs.
    If we have to provide human programmers to adjust these algorithms according to the current dominant legally preferred methods of interpretation, why don’t we just stick with the human ones to begin with?

  32. I’m not sure the success of this proposal hinges on the excellence of the implementation of the AIs so much as the acceptance of society of AIs as sharing similar mindsets.

    http://huewhite.com/umb/2019/0…..ice-robot/

    Hue

    1. In large part, I agree with the notion that the bulk of society must, at every instant, share a mindset with the AI (or at least tolerate the mindset of the AI).

      It seems that an AI Judge would have three easily-implemented modes — the first being “identify and understand my nominator”, the second being “persuade my Blue-Ribbon evaluator(s) to allow me to function as a judge”, and the third being “judge cases before me using a school of thought generally matching that of my nominator, with random departures which placate and/or befuddle my detractors” — and would have a random operational lifespan exceeding both that of its nominator and that of its evaluator(s).

      Is this a “good judge”? No! Is this a judge which might be as acceptable as any sitting judge? Perhaps.

Please to post comments

Comments are closed.