Chief Justice John Roberts on AI and the judiciary

On 31 December 2023, US Supreme Court Chief Justice John Roberts published his 2023 Year-End Report on the Federal Judi­ciary. According to C.J. Roberts, “Every year, I use the Year-End Report to speak to a major issue rele­vant to the whole federal court system.” This year, that “major issue” is AI.

Before addressing AI directly, C.J. Roberts eases into the topic by drawing a parallel to the waves of tech­no­log­ical change that have previ­ously rolled ashore: from the type­writer, to the photo­copier, to Justice Lewis Powell’s “rented Wang computer”, to the Supreme Court’s Atex system, to personal computers, to CM/ECF, which “brought about a seismic shift in effi­ciency.” (Perhaps, though as a user of this system, not a single thing about CM/ECF has changed in decades, including its trapped-in-amber mid-90s web design.) C.J. Roberts notes that the federal judi­ciary still main­tains a charm­ingly retro toll-free number for case infor­ma­tion—for a good time, call 866-222-8029. Inter­est­ingly, C.J. Roberts doesn’t mention any failed tech­nolo­gies, except for the Paige Compos­itor—though it doesn’t appear the Supreme Court actu­ally owned one of those.

Having set a some­what sunny tone, rooted in the pleasant history of office automa­tion, C.J. Roberts turns specif­i­cally to AI. These comments deserve closer scrutiny:

At its core, AI combines algo­rithms and enor­mous data sets to solve prob­lems.

Neither here nor later does C.J. Roberts mention that AI’s reliance on “enor­mous data sets” is already causing prob­lems, some of them legal prob­lems, some of which the Supreme Court may be called upon to solve. (GOFAI enthu­si­asts might quibble that C.J. Roberts consis­tently uses “AI” as a metonym for “machine learning”, but let’s go with it.)

Would the federal judi­ciary adopt a certain tech­nology while it was still clouded by legal uncer­tainty? Perhaps it’s happened before. A dishon­or­able mention in this cate­gory: Zoom, whose soft­ware is widely used by the federal judi­ciary for hear­ings, even though the company was previ­ously charged by the DOJ for its connec­tions to China’s intel­li­gence services, and was also the target of an FTC complaint for misleading users about its data-secu­rity prac­tices. Nothing that ought to bother legal users, I guess?

Its many forms and appli­ca­tions include the facial recog­ni­tion we use to unlock our smart phones and the voice recog­ni­tion we use to direct our smart tele­vi­sions.

C.J. Roberts appar­ently means to suggest that AI has already seeped into our daily tech­nology inter­ac­tions—true—but in so doing, creates a ques­tion­able parallel between different kinds of AI that have qual­i­ta­tively different stakes. If a tele­vi­sion misun­der­stands a voice command, or if it takes two tries to unlock a phone—no big deal. But when we consider injecting AI into the courts, we are setting out a far more signif­i­cant task, with graver conse­quences should it go awry.

Law profes­sors report with both awe and angst that AI appar­ently can earn Bs on law school assign­ments and even pass the bar exam.

Surpris­ingly, C.J. Roberts doesn’t pause to examine that lawyers them­selves are part of the “tech­nology stack” of the federal judi­ciary, and thus the effects of AI on legal educa­tion—for good or ill—will neces­sarily ripple out to the judi­ciary and possibly create external costs. I have written previ­ously about AI in class­rooms, which I think is a horrible idea, though I know nothing about educa­tional policy. But Prof. Amy J. Ko does, and agrees that AI threatens to erode core educa­tional mech­a­nisms. She and I are defi­nitely in the minority, however.

Legal research may soon be unimag­in­able without it. AI obvi­ously has great poten­tial to dramat­i­cally increase access to key infor­ma­tion for lawyers and non-lawyers alike. But just as obvi­ously it risks invading privacy inter­ests and dehu­man­izing the law.

I would be very hesi­tant to put “AI” in the same sentence as “access to key infor­ma­tion”. The output of the typical AI language model is opti­mized for syntactic and linguistic plau­si­bility rather than any kind of under­lying semantic truth­ful­ness. Thus, in the same way that a broken clock is right twice a day, the AI may often emit some­thing that happens to be true. Other times—not. See also Searle’s Chinese Room argu­ment; see also Harry Frank­furt on bull­shit (“bull­shit­ters seek to convey a certain impres­sion of them­selves without being concerned about whether anything at all is true”); see also Gary Marcus, who has written exten­sively on the funda­mental lack of truth­ful­ness of AI language models—not in the sense of delib­er­ately lying, but in the sense of simply having no ability to tell the differ­ence between fact and fiction, any more than a color­blind animal can distin­guish red from green.

For that reason, I worry less about “dehu­man­izing the law” and more about the law becoming perma­nently infested with AI-induced misin­for­ma­tion. On that point—I predict that within three years, there will be a published court deci­sion that includes a hallu­ci­nated AI cita­tion that snuck by every level of human veri­fi­ca­tion. The danger? Those who rely on the opinion later will assume—as they have tradi­tion­ally and reason­ably done—that the sources cited within are legit­i­mate. At that point, the hallu­ci­nated holding could be laun­dered into a real holding. To be clear: no mali­cious intent is needed anywhere in this process. All that is needed is the initial cred­u­lous­ness of a few.

I also wonder about the risks of focusing the law around the ideas most common in the AI model’s training dataset. True, most fact patterns that arise in legal cases are going to lie near the common cases. But some won’t. What happens to those? Will legal-oriented AIs know when a case calls for a novel argu­ment? “Well, many human lawyers wouldn’t notice either.” I’ve never followed this line of thinking—we should embrace tech­nolo­gies that do things as poorly as we do? To what end? This argu­ment also neglects the effect of automa­tion bias: the tendency of humans to give more weight to a result provided by a machine.

Propo­nents of AI tout its poten­tial to increase access to justice, partic­u­larly for liti­gants with limited resources. Our court system has a monopoly on many forms of relief. If you want a discharge in bank­ruptcy, for example, you must see a federal judge. For those who cannot afford a lawyer, AI can help. It drives new, highly acces­sible tools that provide answers to basic ques­tions, including where to find templates and court forms, how to fill them out, and where to bring them for presen­ta­tion to the judge—all without leaving home. These tools have the welcome poten­tial to smooth out any mismatch between avail­able resources and urgent needs in our court system.

This gives me pause, because it sounds a little like C.J. Roberts fore­sees a future where citi­zens of “limited resources” get steered toward a more limited form of justice—that dispensed by AI. Tradi­tion­ally, the judi­ciary has been more protec­tive of its vulner­able liti­gants. To expand on C.J. Roberts’s example: if a self-repre­sented liti­gant were to seek AI advice on bank­ruptcy filings, and the AI happens to hallu­ci­nate some­thing damaging, who ought to be account­able? The bank­ruptcy court, for recom­mending the AI? The AI vendor, for making it? The liti­gant, for using it?

But any use of AI requires caution and humility. One of AI’s promi­nent appli­ca­tions made head­lines this year for a short­coming known as “hallu­ci­na­tion,” which caused the lawyers using the appli­ca­tion to submit briefs with cita­tions to non-exis­tent cases. (Always a bad idea.) Some legal scholars have raised concerns about whether entering confi­den­tial infor­ma­tion into an AI tool might compro­mise later attempts to invoke legal priv­i­leges.

Curi­ously, C.J. Roberts seems to frame “hallu­ci­na­tion” as a bug in a partic­ular AI system, rather than as an intrinsic—and so far unsur­mounted—limi­ta­tion of an entire tech­nology cate­gory. I dislike the term hallu­ci­na­tion in any case because of its asso­ci­a­tion with an altered state of mind or degen­erate behavior. The ques­tion of whether an AI language model will emit a damaging hallu­ci­na­tion is not if, but when—it is entirely normal and prob­a­bilistic behavior. Maybe we can update that hoary old lawyer joke for the AI age—Q: How do you know when an AI lawyer is lying? A: It’s producing output. (Hmm—maybe let’s work­shop that.)

In crim­inal cases, the use of AI in assessing flight risk, recidi­vism, and other largely discre­tionary deci­sions that involve predic­tions has gener­ated concerns about due process, reli­a­bility, and poten­tial bias. At least at present, studies show a persis­tent public percep­tion of a “human-AI fair­ness gap,” reflecting the view that human adju­di­ca­tions, for all of their flaws, are fairer than what­ever the machine spits out.

Both AI models and humans have terrible prob­lems with bias. Let’s call this a two-way tie for last.

Many profes­sional tennis tour­na­ments, including the US Open, have replaced line judges with optical tech­nology to deter­mine whether 130 mile per hour serves are in or out. These deci­sions involve preci­sion to the millimeter. And there is no discre­tion; the ball either did or did not hit the line. By contrast, legal deter­mi­na­tions often involve gray areas that still require appli­ca­tion of human judg­ment.

With this analogy, I don’t know if C.J. Roberts is archly refer­ring to his famous quip during his 2005 confir­ma­tion hearing that he saw his role as chief justice merely to “call balls and strikes”. This was a lawyerly soft-shoe for the ages, since federal judges hold the most concen­trated power in the US govern­ment, and can rewrite the rules of the entire game.

Of course, C.J. Roberts is correct that many legal ques­tions “involve gray areas”. Beyond that, the analogy to AI becomes strained. In tennis tour­na­ments, elec­tronic line-calling is a system that is designed for a specific purpose, and engi­neered to opti­mize factual correct­ness. That is simply not how today’s popular AI models are designed, nor how they work.

Machines cannot fully replace key actors in court. Judges, for example, measure the sincerity of a defen­dant’s allo­cu­tion at sentencing. Nuance matters: Much can turn on a shaking hand, a quiv­ering voice, a change of inflec­tion, a bead of sweat, a moment’s hesi­ta­tion, a fleeting break in eye contact. And most people still trust humans more than machines to perceive and draw the right infer­ences from these clues.

As a typog­ra­pher I concur. C.J. Roberts echoes some­thing I’ve said for many years, which is that words them­selves—whether spoken or written—are only part of the message that gets conveyed. Presen­ta­tion carries weight too. Still, I expect there will even­tu­ally be AI-powered succes­sors to lie detec­tors that will claim to offer more reli­able results, and attempt to make our tradi­tional “trust [in] humans” seem compar­a­tively quaint.

Appel­late judges, too, perform quin­tes­sen­tially human func­tions. Many appel­late deci­sions turn on whether a lower court has abused its discre­tion, a stan­dard that by its nature involves fact-specific gray areas. Others focus on open ques­tions about how the law should develop in new areas. AI is based largely on existing infor­ma­tion, which can inform but not make such deci­sions.

One of the key disjunc­tions between AI and the law is that by virtue of being trained on existing data, AI models are unavoid­ably retro­spec­tive. But the law, because it intends to shape future behavior, is neces­sarily prospec­tive.

For instance: suppose that next week, the US completely decrim­i­nal­izes and dereg­u­lates mari­juana, either through legis­la­tion or judi­cial ruling. At that point, every AI in exis­tence will be unable to generate good guesses about any legal issue pertaining to mari­juana. Google has many flaws, but it has long had the capacity to dynam­i­cally repri­or­i­tize its search results based on changes to the fabric of reality. AI models, by contrast, are compar­a­tively lumbering beasts, only capable of incor­po­rating new infor­ma­tion through various methods of retraining, which are expen­sive and there­fore occa­sional.

Rule 1 of the Federal Rules of Civil Proce­dure directs the parties and the courts to seek the “just, speedy, and inex­pen­sive” reso­lu­tion of cases. Many AI appli­ca­tions indis­putably assist the judi­cial system in advancing those goals. As AI evolves, courts will need to consider its proper uses in liti­ga­tion.

If I were a legal-writing teacher, I might critique C.J. Roberts’s asser­tion that “AI appli­ca­tions indis­putably assist the judi­cial system” as overly conclu­sory. Where are the exam­ples and evidence? What is the basis for this confi­dence? How is it possible to make “indis­putable” asser­tions about any aspect of a commer­cially nascent tech­nology?

The public debut of AI over the last 18 months has been, shall we say, some­thing less than an unal­loyed success. Contra C.J. Roberts, there is plenty about today’s AI systems that is unjust, slow, and expen­sive. Indeed, if AI is intro­duced into the judi­cial system overea­gerly, this will likely itself be liti­gated as a viola­tion of US consti­tu­tional rights—say, viola­tion of equal protec­tion, of due process, of the right of access to the courts, of the right to counsel, or of the right to a jury. (Though if these consti­tu­tional chal­lenges to AI arise, I predict that certain citi­zens will advo­cate for amending the Consti­tu­tion, not the AI.) To be fair, I’m the guy who thinks courts mandating Times New Roman for filings are violating the Consti­tu­tion.

In the federal courts, several Judi­cial Confer­ence Commit­tees—including those dealing with court admin­is­tra­tion and case manage­ment, cyber­se­cu­rity, and the rules of prac­tice and proce­dure, to name just a few—will be involved in that effort. I am glad that they will be. I predict that human judges will be around for a while. But with equal confi­dence I predict that judi­cial work—partic­u­larly at the trial level—will be signif­i­cantly affected by AI. Those changes will involve not only how judges go about doing their job, but also how they under­stand the role that AI plays in the cases that come before them.

We can be glad that the federal judi­ciary is starting to grapple with AI. But I fear C.J. Roberts is peering down the wrong end of the tele­scope. By initially situ­ating AI within the frame of office automa­tion, his commen­tary flows from the premise of what will happen when AI is put in the hands of judges, justices, and other human agents capable of reasoning crit­i­cally about the capa­bil­i­ties and limi­ta­tions of this tech­nology. That’s fine as far as it goes.

But judi­cial offi­cers are by far the smallest segment of people involved with the court system. There­fore, it’s hard not to imagine that these benign poli­cies of produc­tive automa­tion from within the judi­ciary will be utterly over­whelmed by the unpro­duc­tive uses of AI that will be perpe­trated—often unwit­tingly—from without, by lawyers and liti­gants appearing before these courts. As these AI tools spread into common use, courts will find them­selves inun­dated by AI-assisted filings that noncha­lantly weave truths, half-truths, and outright fibs into an epic flood of bull­shit.

Indeed, C.J. Roberts spends no time at all on what might be called “defen­sive” AI coun­ter­mea­sures that could become neces­sary. When we think of what will be required of courts to stanch this flood, the AI future seems less “just, speedy, and inex­pen­sive” and more like it could amount to a denial-of-service attack on the courts them­selves, as AI-loving lawyers create an ever-esca­lating burden for judi­cial offers, clerks, and opposing counsel.

Still not alarmed? Consider this—none of these conse­quences require malign actors or next-gener­a­tion AI. Mediocre AI in the hands of mediocre lawyers will suffice. Today, we already have plenty of both—precursor ingre­di­ents to a combustible reac­tion. They just need to be stirred together.

Worse, these depre­da­tions will become a form of regres­sive taxa­tion. I agree with C.J. Roberts that court systems with more limited resources will embrace AI more eagerly. But they may well find that those limited resources are consumed from the other side to address the unin­tended conse­quences of AI.

Overall, I find the report vague and detached on the subject of AI. Espe­cially over the last year, plenty of evidence has emerged to contest the sunny view of AI on display here. Having taken up AI as a topic, I didn’t expect C.J. Roberts to examine, say, the possi­bility of AI causing human exinc­tion. But a richer analysis of the pros and cons still awaits. To be fair, perhaps that’s what the afore­men­tioned “Judi­cial Confer­ence Commit­tees” will be doing.

Anyone working near the court system knows that the provi­sion of judi­cial services can at times be diffi­cult and messy. It is also one of the most crucial func­tions of our local, state, and federal govern­ments, because it’s intrinsic to upholding the rule of law. So far, AI has a dubious rela­tion­ship with that. As such, courts would do well to proceed more gingerly with AI than they did with, say, the photo­copier. In many areas, it seems likely humans are going to do the right thing with AI only after we’ve exhausted all other possi­bil­i­ties. The courts will—hope­fully—strive for a higher stan­dard.

PS—possible uses for AI in the courts

Machine learning is an approx­i­ma­tion tech­nique. Thus, it’s pecu­liar that AI propo­nents keep recom­mending it for prob­lems that require precise, correct, deter­min­istic answers, when these are the kind of results that machine-learning AI systems are worst at providing.

The math­e­matics under­pin­ning machine learning date from the 19th century, when the French math­e­mati­cian Augustin-Louis Cauchy was studying the motion of celes­tial objects. The poly­no­mials describing these motions had six vari­ables, which Cauchy, armed primarily with pen and ink, found too diffi­cult to solve directly. Instead, Cauchy devel­oped the tech­nique of gradient descent to find approx­i­mate answers to these equa­tions, which in many cases sufficed. Today, gradient descent is the key to training machine-learning AI systems, and the poly­no­mials can have millions of vari­ables. (Yes, my first major in college was math.)

Thus, the ques­tion I wish were posed more often about AI and machine learning is what prob­lems are these tools well adapted to solve? Too frequently, these systems are held out as orac­ular spec­tac­u­lars when they are better under­stood as guessers. But a cheap, fast, and reason­ably reli­able guess can often be valu­able, espe­cially as a precursor or supple­ment—not substi­tute—for human reasoning and agency. And espe­cially in situ­a­tions where one has a large set of similar items that would benefit from prior­i­ti­za­tion.

I dislike anthro­po­mor­phic metaphors for AI, but let’s go with it for a minute: let’s think of places where an AI with the capacity of an enthuasi­astic but inex­pe­ri­enced law-school student might add value. For instance:

In each of these exam­ples, however, I’m not proposing that AI replace any of the human-enacted review proce­dures already in place. Just that AI could help prior­i­tize the targets of that review, or act as a double-check. Unfor­tu­nately, automa­tion bias and a general orien­ta­tion toward cost effi­ciency will mean that config­u­ra­tions like these, once adopted, will tend to drift toward more reliance on AI and away from human engage­ment.

These exam­ples also suppose that the judi­ciary has access to an AI model that has been trained on an appro­priate corpus of data—in the sense of legally and ethi­cally obtained, and suit­able to the task—and that is not hobbled by hallu­ci­na­tions. Today this does not exist. Do you want the survival of the US rule of law to be contin­gent on answers from an AI trained on shadow libraries and Reddit comments? Be careful what you wish for.