Domination: a contrarian view of AI risk

As a justi­fi­ca­tion for going light on AI regu­la­tion, a US congressman recently said that AI “has very conse­quen­tial nega­tive impacts, poten­tially, but those do not include an army of evil robots rising up to take over the world.”

What a relief. This is a refer­ence, I assume, to the outlandish sci-fi scenario popu­lar­ized by the Termi­nator movie series (and others) featuring humanoid robots battling humans.

Why outlandish? It’s econom­i­cally irra­tional. Why would an autonomous synthetic intel­li­gence spend its combat resources on the complex project of manu­fac­turing mechan­ical humanoid soldiers, instead of simpler, cheaper methods? (That said, the discourse on killer robots has persisted for a while.) Call this the Skynet fallacy: a fictional depic­tion of AI is opti­mized to be cine­mat­i­cally inter­esting and is there­fore inher­ently unlikely.

Though some depic­tions are less outlandish than others. To me, the most believ­able fictional AI is still HAL 9000 in 2001: A Space Odyssey.

HAL doesn’t have inde­pen­dent mobility, so it must rely on the tools avail­able to it: control of the ship’s systems and persua­sion of the human astro­nauts onboard for anything else. As a result, HAL recog­nizes that main­taining its cred­i­bility with the human astro­nauts is crit­ical.

Contrast these cine­matic fantasies with the terri­fying reality of AI warfare, for instance as reported by Andrew Cock­burn in a piece for Harper’s called “The Pentagon’s Silicon Valley Problem”. In terms of agency, it should be plain that there is no differ­ence between a bipedal AI robot that wields a laser rifle and an AI that chooses targets for a human who controls the weapon. Our first killer robot isn’t bipedal or shiny. But it has defi­nitely arrived. And the weapon it wields is us.

OK doomer

AGI is short for arti­fi­cial general intel­li­gence, a phrase that encap­su­lates the idea of a future AI that is in some manner supe­rior to human intel­li­gence. But “in some manner supe­rior” turns out to leave much to the eye of the beholder. There is no single empir­ical defi­n­i­tion of AGI. This leaves a gap at the center of AGI discourse that doesn’t exist with other hypo­thet­ical scien­tific missions. If we say “let’s land astro­nauts on Mars”—we know what that is. If we say “let’s achieve AGI”—we don’t.

For instance, AI commen­tator & critic Gary Marcus defines AGI as:

a short­hand for any intel­li­gence … that is flex­ible and general, with resource­ful­ness and reli­a­bility compa­rable to (or beyond) human intel­li­gence.

Marcus is a cogni­tive scien­tist by training, so it’s under­stand­able that he would leave the specifics vague as to how exactly these char­ac­ter­is­tics of “resource­ful­ness and reli­a­bility” would mani­fest. (Marcus is also the co-author of one of my favorite books on AI, Rebooting AI.) As a scien­tific matter, an AI that figures out how to feed the world for one dollar might be just as inter­esting as one that figures out how to destroy the same world for the same dollar.

As a matter of human survival, however, this distinc­tion matters a lot. The problem of how to ensure AI acts in confor­mity with human goals is called align­ment. Nick Bostrom’s 2014 book Super­in­tel­li­gence offered the now-famous parable of the “paper­clip maxi­mizer”: an AI is asked to produce paper­clips but given no other constraints on its behavior. It thus inad­ver­tently consumes all mate­rial on our planet in its quest to make more paper­clips.

Bostrom’s point is twofold:

  1. The issue of AI risk is sepa­rate from that of AI agency: an AI doesn’t have to acquire evil sentience to consume the planet. It may do so simply as an inci­dental conse­quence of pursuing some other goal.

  2. Such fail­ures of align­ment between AI and human goals are trig­gered compar­a­tively more easily than fail­ures of agency, because they arise from the simpler human error of incom­pletely describing the problem.

In other words, an AI cata­strophe arising from failure of align­ment is much more likely than one arising from sci-fi-style malig­nant agency of the AI.

It’s about here in the argu­ment that eyes start rolling and epithets like “doomer” are tossed around. Recently, a friend called me a “reli­gious zealot” for suggesting in a conver­sa­tion that AI chat­bots were not a posi­tive devel­op­ment for humanity.

But this thought exper­i­ment is rooted in prob­a­bility—not reli­gious belief. What makes it go haywire under an expected-value analysis—where we multiply the prob­a­bility of each outcome by its desir­ability to deter­mine the best choice—is that no matter how small the prob­a­bility of AI cata­strophe, that outcome is so bad that it outweighs any prob­a­bility of posi­tive effects.

Of course, AI propo­nents make a symmetric argu­ment: that the likely bene­fits of AI will be so wildly human-enhancing that inhibiting AI for any reason is essen­tially immoral.

It’s also possible that both things turn out to be true: the posi­tive effects will be predom­i­nant for a while—a long time, even—before the nega­tive ones arrive.

Unfor­tu­nately, that last step will be a doozy.

The gorilla problem

Bringing me to another favorite book on AI: Human Compat­ible by Stuart Russell. Russell’s book is all about the indis­pens­ability of align­ment, and the neces­sity of achieving what he calls “prov­ably bene­fi­cial AI.” Russell sets out his argu­ment in admirably mild terms—IIRC nowhere does he actu­ally fore­cast the anni­hi­la­tion of the human race by AI. Prob­ably a good move rhetor­i­cally, despite the nonzero prob­a­bility.

Because here’s the thing: to me one of the greatest risks posed by AI is rooted in our failure of imag­i­na­tion: our failure to broadly imagine the possible forms AI (including AGI) could take; our failure to broadly imagine the possible conse­quences it could wreak.

There are many cata­strophe-class AI events that don’t require AI to kill us or other­wise impair our phys­ical health. For instance, I expect that most humans would also consider it cata­strophic if, say, AI griev­ously impaired our polit­ical system, our economic system, our popular culture, our intel­lec­tual devel­op­ment, or our emotional health. Those are all on the table too. And much more likely than literal anni­hi­la­tion.

The inces­sant anthro­po­mor­phiza­tion of AI is no acci­dent. We remain narcis­sis­ti­cally preoc­cu­pied with the Promethean narra­tives we’ve told ourselves for years about human dominion over tech­nology and nature. About gods made in our own image. This leads us to be alert primarily to versions of risks that we’ve encoun­tered before. We project the past into the future.

No surprise that super­in­tel­li­gent machines are commonly depicted on book covers as humanoid robots. This visu­ally reca­pit­u­lates our preferred narra­tive: we made it in our image, so we control it. Some exam­ples from my own book­shelf—

In his book, Russell neatly frames this fallacy as the gorilla problem:

Around ten million years ago, the ances­tors of the modern gorilla created (acci­den­tally, to be sure) the genetic lineage leading to modern humans. How do the gorillas feel about this? Clearly, if they were to tell us about their species’ current situ­a­tion vis-à-vis humans, the consensus opinion would be very nega­tive indeed. Their species has essen­tially no future beyond that which we deign to allow.

Russell picks gorillas delib­er­ately—we recog­nize them as intel­li­gent animals. We feel a certain ances­tral kinship toward them. But Russell’s argu­ment could be extended back through the evolu­tionary record. Genetic muta­tions have repeat­edly arisen that produce species that are so successful that they gain dominion over their antecedents. Well, until the next such muta­tion—and then the domi­nator becomes the domi­nated.

AGI as domination

For this reason, in prac­tical terms I wonder whether describing AGI in terms of human intel­li­gence is too limiting. Suppose we asked a gorilla of 10 million years ago to describe a conjec­tural human purely in terms of gorilla char­ac­ter­is­tics. That descrip­tion would get a few things right. But it would get the most conse­quen­tial things wrong.

Thus, the char­ac­ter­i­za­tion of AGI as more or less super­human intel­li­gence strikes me as at least prema­ture and at best inap­po­site. As a human being, I don’t fear an entity of supe­rior intel­li­gence. (In my lines of work, I meet them all the time.) Rather, I fear an entity that might turn me & my fellow humans into domes­ti­cated animals grazing in its pasture.

Why would such an entity even need to be intel­li­gent? If the entity can achieve control over me through other means, its compar­a­tive intel­li­gence is irrel­e­vant. Thus I prefer to concep­tu­alize AGI not in terms of its capa­bil­i­ties but rather in terms of its primary effect: any synthetic intel­li­gence that can domi­nate humans.

how to recognize AGI

Contin­uing the thought exper­i­ment, I take these to be the likely char­ac­ter­is­tics of domi­nating AGI. With the caveat that I am not a cogni­tive scien­tist—just an ordi­nary human who wants to avoid unwit­tingly becoming an animal on a farm. I share these not from a posi­tion of authority. Rather, it is a small contri­bu­tion to coun­teract the failure of imag­i­na­tion I mentioned above.

  1. I believe AGI will be emer­gent, not engi­neered. Corol­lary: everyone attempting to build AGI in a lab or a startup will fail, or (more likely) will end up moving the goal­posts to claim AGI where it doesn’t exist. (Those who think LLMs can achieve true intel­li­gence should consider Searle’s Chinese Room Argu­ment.) Another corol­lary: real AGI will be extremely diffi­cult to control, because by the time it’s noticed, it will be perva­sive and opaque.

  2. I believe AGI will emerge by the simplest means possible. Corol­lary: if we assume that the persis­tence of AGI requires some engage­ment with the phys­ical world—e.g., the power grid must be main­tained—it will be simplest for an AGI to use humans as its instru­ments, because they are plen­tiful and easily manip­u­lated through algo­rithmic means (this is, after all, the entire busi­ness model of the internet).

  3. Because it will rely on humans, I believe a successful AGI will be asymp­to­matic for a long time before it causes a cata­strophe. Other­wise it will tend to provoke human resis­tance. Corol­lary: a successful AGI will likely discover what human author­i­tar­ians have throughout history, which is that providing goodies to humans is a great way to reduce their resis­tance.

  4. Because it will emerge by simplest means, I believe AGI will not have super­human intel­li­gence. On the contrary, it’s much more likely to be brutally stupid. Corol­lary: this quality will also help the AGI avoid resis­tance, because if detected at all, its risk will be under­es­ti­mated or dismissed.

  5. I believe the cata­stro­phes caused by AGI will be conse­quen­tial but not agentic. By that I mean that AGI will feel about humans the way a tornado feels about houses it destroys: nothing at all. The harm to us will be an inci­dental effect of obstructing an irre­sistible force. So when AGI opti­mists say “These systems have no desires of their own; they’re not inter­ested in taking over”—we shouldn’t be comforted. Both things can be true: no inten­tion of domi­na­tion, yet domi­na­tion never­the­less.

Taking these qual­i­ties together, AGI may turn out to be just a bigger, more distilled form of the same instru­men­tality that under­lies internet adver­tising and social media—modi­fying human behavior by exposing us to messages that make us stupider and angrier. An AGI that acts as a ruth­less and effec­tive propa­gan­dist—that deter­mines the nature of truth—would possibly gain an insur­mount­able advan­tage.

I expect nobody wants to seri­ously consider this possi­bility because “ad network gone rogue” is not nearly as romantic an AGI origin story as the more Edis­onian narra­tive of nerds toiling into the wee hours. On the other hand, it’s hard to imagine that AGI will have much trouble fooling us; we are already so adept at fooling ourselves.