Chief Justice John Roberts on AI and the judiciary

4 January 2024

On 31 December 2023, US Supreme Court Chief Justice John Roberts published his 2023 Year-End Report on the Federal Judiciary. According to C.J. Roberts, “Every year, I use the Year-End Report to speak to a major issue relevant to the whole federal court system.” This year, that “major issue” is AI.

Before addressing AI directly, C.J. Roberts eases into the topic by drawing a parallel to the waves of technological change that have previously rolled ashore: from the typewriter, to the photocopier, to Justice Lewis Powell’s “rented Wang computer”, to the Supreme Court’s Atex system, to personal computers, to CM/ECF, which “brought about a seismic shift in efficiency.” (Perhaps, though as a user of this system, not a single thing about CM/ECF has changed in decades, including its trapped-in-amber mid-90s web design.) C.J. Roberts notes that the federal judiciary still maintains a charmingly retro toll-free number for case information—for a good time, call 866-222-8029. Interestingly, C.J. Roberts doesn’t mention any failed technologies, except for the Paige Compositor—though it doesn’t appear the Supreme Court actually owned one of those.

Having set a somewhat sunny tone, rooted in the pleasant history of office automation, C.J. Roberts turns specifically to AI. These comments deserve closer scrutiny:

At its core, AI combines algorithms and enormous data sets to solve problems.

Neither here nor later does C.J. Roberts mention that AI’s reliance on “enormous data sets” is already causing problems, some of them legal problems, some of which the Supreme Court may be called upon to solve. (GOFAI enthusiasts might quibble that C.J. Roberts consistently uses “AI” as a metonym for “machine learning”, but let’s go with it.)

Would the federal judiciary adopt a certain technology while it was still clouded by legal uncertainty? Perhaps it’s happened before. A dishonorable mention in this category: Zoom, whose software is widely used by the federal judiciary for hearings, even though the company was previously charged by the DOJ for its connections to China’s intelligence services, and was also the target of an FTC complaint for misleading users about its data-security practices. Nothing that ought to bother legal users, I guess?

Its many forms and applications include the facial recognition we use to unlock our smart phones and the voice recognition we use to direct our smart televisions.

C.J. Roberts apparently means to suggest that AI has already seeped into our daily technology interactions—true—but in so doing, creates a questionable parallel between different kinds of AI that have qualitatively different stakes. If a television misunderstands a voice command, or if it takes two tries to unlock a phone—no big deal. But when we consider injecting AI into the courts, we are setting out a far more significant task, with graver consequences should it go awry.

Law professors report with both awe and angst that AI apparently can earn Bs on law school assignments and even pass the bar exam.

Surprisingly, C.J. Roberts doesn’t pause to examine that lawyers themselves are part of the “technology stack” of the federal judiciary, and thus the effects of AI on legal education—for good or ill—will necessarily ripple out to the judiciary and possibly create external costs. I have written previously about AI in classrooms, which I think is a horrible idea, though I know nothing about educational policy. But Prof. Amy J. Ko does, and agrees that AI threatens to erode core educational mechanisms. She and I are definitely in the minority, however.

Legal research may soon be unimaginable without it. AI obviously has great potential to dramatically increase access to key information for lawyers and non-lawyers alike. But just as obviously it risks invading privacy interests and dehumanizing the law.

I would be very hesitant to put “AI” in the same sentence as “access to key information”. The output of the typical AI language model is optimized for syntactic and linguistic plausibility rather than any kind of underlying semantic truthfulness. Thus, in the same way that a broken clock is right twice a day, the AI may often emit something that happens to be true. Other times—not. See also Searle’s Chinese Room argument; see also Harry Frankfurt on bullshit (“bullshitters seek to convey a certain impression of themselves without being concerned about whether anything at all is true”); see also Gary Marcus, who has written extensively on the fundamental lack of truthfulness of AI language models—not in the sense of deliberately lying, but in the sense of simply having no ability to tell the difference between fact and fiction, any more than a colorblind animal can distinguish red from green.

For that reason, I worry less about “dehumanizing the law” and more about the law becoming permanently infested with AI-induced misinformation. On that point—I predict that within three years, there will be a published court decision that includes a hallucinated AI citation that snuck by every level of human verification. The danger? Those who rely on the opinion later will assume—as they have traditionally and reasonably done—that the sources cited within are legitimate. At that point, the hallucinated holding could be laundered into a real holding. To be clear: no malicious intent is needed anywhere in this process. All that is needed is the initial credulousness of a few.

I also wonder about the risks of focusing the law around the ideas most common in the AI model’s training dataset. True, most fact patterns that arise in legal cases are going to lie near the common cases. But some won’t. What happens to those? Will legal-oriented AIs know when a case calls for a novel argument? “Well, many human lawyers wouldn’t notice either.” I’ve never followed this line of thinking—we should embrace technologies that do things as poorly as we do? To what end? This argument also neglects the effect of automation bias: the tendency of humans to give more weight to a result provided by a machine.

Proponents of AI tout its potential to increase access to justice, particularly for litigants with limited resources. Our court system has a monopoly on many forms of relief. If you want a discharge in bankruptcy, for example, you must see a federal judge. For those who cannot afford a lawyer, AI can help. It drives new, highly accessible tools that provide answers to basic questions, including where to find templates and court forms, how to fill them out, and where to bring them for presentation to the judge—all without leaving home. These tools have the welcome potential to smooth out any mismatch between available resources and urgent needs in our court system.

This gives me pause, because it sounds a little like C.J. Roberts foresees a future where citizens of “limited resources” get steered toward a more limited form of justice—that dispensed by AI. Traditionally, the judiciary has been more protective of its vulnerable litigants. To expand on C.J. Roberts’s example: if a self-represented litigant were to seek AI advice on bankruptcy filings, and the AI happens to hallucinate something damaging, who ought to be accountable? The bankruptcy court, for recommending the AI? The AI vendor, for making it? The litigant, for using it?

But any use of AI requires caution and humility. One of AI’s prominent applications made headlines this year for a shortcoming known as “hallucination,” which caused the lawyers using the application to submit briefs with citations to non-existent cases. (Always a bad idea.) Some legal scholars have raised concerns about whether entering confidential information into an AI tool might compromise later attempts to invoke legal privileges.

Curiously, C.J. Roberts seems to frame “hallucination” as a bug in a particular AI system, rather than as an intrinsic—and so far unsurmounted—limitation of an entire technology category. I dislike the term hallucination in any case because of its association with an altered state of mind or degenerate behavior. The question of whether an AI language model will emit a damaging hallucination is not if, but when—it is entirely normal and probabilistic behavior. Maybe we can update that hoary old lawyer joke for the AI age—Q: How do you know when an AI lawyer is lying? A: It’s producing output. (Hmm—maybe let’s workshop that.)

In criminal cases, the use of AI in assessing flight risk, recidivism, and other largely discretionary decisions that involve predictions has generated concerns about due process, reliability, and potential bias. At least at present, studies show a persistent public perception of a “human-AI fairness gap,” reflecting the view that human adjudications, for all of their flaws, are fairer than whatever the machine spits out.

Both AI models and humans have terrible problems with bias. Let’s call this a two-way tie for last.

Many professional tennis tournaments, including the US Open, have replaced line judges with optical technology to determine whether 130 mile per hour serves are in or out. These decisions involve precision to the millimeter. And there is no discretion; the ball either did or did not hit the line. By contrast, legal determinations often involve gray areas that still require application of human judgment.

With this analogy, I don’t know if C.J. Roberts is archly referring to his famous quip during his 2005 confirmation hearing that he saw his role as chief justice merely to “call balls and strikes”. This was a lawyerly soft-shoe for the ages, since federal judges hold the most concentrated power in the US government, and can rewrite the rules of the entire game.

Of course, C.J. Roberts is correct that many legal questions “involve gray areas”. Beyond that, the analogy to AI becomes strained. In tennis tournaments, electronic line-calling is a system that is designed for a specific purpose, and engineered to optimize factual correctness. That is simply not how today’s popular AI models are designed, nor how they work.

Machines cannot fully replace key actors in court. Judges, for example, measure the sincerity of a defendant’s allocution at sentencing. Nuance matters: Much can turn on a shaking hand, a quivering voice, a change of inflection, a bead of sweat, a moment’s hesitation, a fleeting break in eye contact. And most people still trust humans more than machines to perceive and draw the right inferences from these clues.

As a typographer I concur. C.J. Roberts echoes something I’ve said for many years, which is that words themselves—whether spoken or written—are only part of the message that gets conveyed. Presentation carries weight too. Still, I expect there will eventually be AI-powered successors to lie detectors that will claim to offer more reliable results, and attempt to make our traditional “trust [in] humans” seem comparatively quaint.

Appellate judges, too, perform quintessentially human functions. Many appellate decisions turn on whether a lower court has abused its discretion, a standard that by its nature involves fact-specific gray areas. Others focus on open questions about how the law should develop in new areas. AI is based largely on existing information, which can inform but not make such decisions.

One of the key disjunctions between AI and the law is that by virtue of being trained on existing data, AI models are unavoidably retrospective. But the law, because it intends to shape future behavior, is necessarily prospective.

For instance: suppose that next week, the US completely decriminalizes and deregulates marijuana, either through legislation or judicial ruling. At that point, every AI in existence will be unable to generate good guesses about any legal issue pertaining to marijuana. Google has many flaws, but it has long had the capacity to dynamically reprioritize its search results based on changes to the fabric of reality. AI models, by contrast, are comparatively lumbering beasts, only capable of incorporating new information through various methods of retraining, which are expensive and therefore occasional.

Rule 1 of the Federal Rules of Civil Procedure directs the parties and the courts to seek the “just, speedy, and inexpensive” resolution of cases. Many AI applications indisputably assist the judicial system in advancing those goals. As AI evolves, courts will need to consider its proper uses in litigation.

If I were a legal-writing teacher, I might critique C.J. Roberts’s assertion that “AI applications indisputably assist the judicial system” as overly conclusory. Where are the examples and evidence? What is the basis for this confidence? How is it possible to make “indisputable” assertions about any aspect of a commercially nascent technology?

The public debut of AI over the last 18 months has been, shall we say, something less than an unalloyed success. Contra C.J. Roberts, there is plenty about today’s AI systems that is unjust, slow, and expensive. Indeed, if AI is introduced into the judicial system overeagerly, this will likely itself be litigated as a violation of US constitutional rights—say, violation of equal protection, of due process, of the right of access to the courts, of the right to counsel, or of the right to a jury. (Though if these constitutional challenges to AI arise, I predict that certain citizens will advocate for amending the Constitution, not the AI.) To be fair, I’m the guy who thinks courts mandating Times New Roman for filings are violating the Constitution.

In the federal courts, several Judicial Conference Committees—including those dealing with court administration and case management, cybersecurity, and the rules of practice and procedure, to name just a few—will be involved in that effort. I am glad that they will be. I predict that human judges will be around for a while. But with equal confidence I predict that judicial work—particularly at the trial level—will be significantly affected by AI. Those changes will involve not only how judges go about doing their job, but also how they understand the role that AI plays in the cases that come before them.

We can be glad that the federal judiciary is starting to grapple with AI. But I fear C.J. Roberts is peering down the wrong end of the telescope. By initially situating AI within the frame of office automation, his commentary flows from the premise of what will happen when AI is put in the hands of judges, justices, and other human agents capable of reasoning critically about the capabilities and limitations of this technology. That’s fine as far as it goes.

But judicial officers are by far the smallest segment of people involved with the court system. Therefore, it’s hard not to imagine that these benign policies of productive automation from within the judiciary will be utterly overwhelmed by the unproductive uses of AI that will be perpetrated—often unwittingly—from without, by lawyers and litigants appearing before these courts. As these AI tools spread into common use, courts will find themselves inundated by AI-assisted filings that nonchalantly weave truths, half-truths, and outright fibs into an epic flood of bullshit.

Indeed, C.J. Roberts spends no time at all on what might be called “defensive” AI countermeasures that could become necessary. When we think of what will be required of courts to stanch this flood, the AI future seems less “just, speedy, and inexpensive” and more like it could amount to a denial-of-service attack on the courts themselves, as AI-loving lawyers create an ever-escalating burden for judicial offers, clerks, and opposing counsel.

Still not alarmed? Consider this—none of these consequences require malign actors or next-generation AI. Mediocre AI in the hands of mediocre lawyers will suffice. Today, we already have plenty of both—precursor ingredients to a combustible reaction. They just need to be stirred together.

Worse, these depredations will become a form of regressive taxation. I agree with C.J. Roberts that court systems with more limited resources will embrace AI more eagerly. But they may well find that those limited resources are consumed from the other side to address the unintended consequences of AI.

Overall, I find the report vague and detached on the subject of AI. Especially over the last year, plenty of evidence has emerged to contest the sunny view of AI on display here. Having taken up AI as a topic, I didn’t expect C.J. Roberts to examine, say, the possibility of AI causing human extinction. But a richer analysis of the pros and cons still awaits. To be fair, perhaps that’s what the aforementioned “Judicial Conference Committees” will be doing.

Anyone working near the court system knows that the provision of judicial services can at times be difficult and messy. It is also one of the most crucial functions of our local, state, and federal governments, because it’s intrinsic to upholding the rule of law. So far, AI has a dubious relationship with that. As such, courts would do well to proceed more gingerly with AI than they did with, say, the photocopier. In many areas, it seems likely humans are going to do the right thing with AI only after we’ve exhausted all other possibilities. The courts will—hopefully—strive for a higher standard.

PS—possible uses for AI in the courts

Machine learning is an approximation technique. Thus, it’s peculiar that AI proponents keep recommending it for problems that require precise, correct, deterministic answers, when these are the kind of results that machine-learning AI systems are worst at providing.

The mathematics underpinning machine learning date from the 19th century, when the French mathematician Augustin-Louis Cauchy was studying the motion of celestial objects. The polynomials describing these motions had six variables, which Cauchy, armed primarily with pen and ink, found too difficult to solve directly. Instead, Cauchy developed the technique of gradient descent to find approximate answers to these equations, which in many cases sufficed. Today, gradient descent is the key to training machine-learning AI systems, and the polynomials can have millions of variables. (Yes, my first major in college was math.)

Thus, the question I wish were posed more often about AI and machine learning is what problems are these tools well adapted to solve? Too frequently, these systems are held out as oracular spectaculars when they are better understood as guessers. But a cheap, fast, and reasonably reliable guess can often be valuable, especially as a precursor or supplement—not substitute—for human reasoning and agency. And especially in situations where one has a large set of similar items that would benefit from prioritization.

I dislike anthropomorphic metaphors for AI, but let’s go with it for a minute: let’s think of places where an AI with the capacity of an enthuasiastic but inexperienced law-school student might add value. For instance:

For filing clerks, AI could make a first guess whether each electronically filed document complies with court rules. Hundreds of documents are filed a day.
For judges, AI could make a first guess whether the citations in a brief accurately summarize the cited authorities, and allow the computation of a “credibility score” that would suggest whether the brief deserves tougher-than-usual human scrutiny.
For jury-pool managers, AI could estimate how many potential jurors would need to be summoned based on the types of cases going to trial, to ensure enough are on hand.
For courts, AI could help determine whether judges have conflicts of interest requiring recusal, supplementing existing conflict-checking procedures, detecting subtler conflicts that humans are apt to overlook. Not my idea—in his 2021 report, C.J. Roberts conceded that “the information systems that help courts catch and prevent conflicts are due for a refresh”. Wish granted, possibly.

In each of these examples, however, I’m not proposing that AI replace any of the human-enacted review procedures already in place. Just that AI could help prioritize the targets of that review, or act as a double-check. Unfortunately, automation bias and a general orientation toward cost efficiency will mean that configurations like these, once adopted, will tend to drift toward more reliance on AI and away from human engagement.

These examples also suppose that the judiciary has access to an AI model that has been trained on an appropriate corpus of data—in the sense of legally and ethically obtained, and suitable to the task—and that is not hobbled by hallucinations. Today this does not exist. Do you want the survival of the US rule of law to be contingent on answers from an AI trained on shadow libraries and Reddit comments? Be careful what you wish for.

update, 495 days later

On the subject of laundering hallucinations into law—a new order criticizing a faulty AI-assisted brief says: “Plaintiff’s use of AI affirmatively misled me. I read their brief, was persuaded (or at least intrigued) by the authorities that they cited, and looked up the decisions to learn more about them—only to find that they didn’t exist. That’s scary. It almost led to the scarier outcome (from my perspective) of including those bogus materials in a judicial order.”