How a Superintelligent AI Could Convince You That You're a Simulation

The famous thought experiment frames the problems with designing artificial intelligence.

May 6 2015, 9:00am

Image: Carl Berkeley/Flickr

Imagine a genuinely super-intelligent AI is imprisoned inside a virtual world—let's just call it a box. You do not know if this AI is sinister, benevolent, or apathetic. All you know is that the AI wants out of the box, and that you and the AI are able to interact via text interface. If the AI is truly super-intelligent, could you talk with it for five hours without it manipulating you into opening the box?

That's the thought experiment posed by Eliezer Yudkowsky, a research fellow at the Machine Intelligence Research Institute (MIRI). MIRI is made up of a cadre of scientists who study the risks of super-intelligent AI; though small, it's attracted attention and controversy. Both PayPal co-founder Peter Thiel and nanotechnologist Christine Peterson, coiner of the term "open source," serve on its advisory board.

Yudkowsky asserts that a super-intelligent AI could say whatever it needed to say to convince you: careful reasoning, threats, trickery, building rapport, subconscious suggestion, and so on. At lightning speed the AI plots, probing weaknesses and determining how to most easily persuade you. As existential risk theorist Nick Bostrom puts it, "basically we should assume that a 'superintelligence' would be able to achieve whatever goals it has."

The AI-box experiment inspires doubt about our ability to control what we might create. It also unfolds to reveal fairly bizarre implications about what we can know about our own reality.

Simulated environments are arguably the ideal breeding ground for artificial intelligences. Some of the more plausible methods of creating a strong AI involve simulated worlds—by imposing constraints within an artificial environment and selecting for valued traits, scientists could attempt to roughly duplicate humanity's own pattern of development towards sentience. Forming an AI in a simulated environment could also prevent it from "leaking" out into the world before its intent and safety have been assessed.

Based on past attempts, it now seems that it may be impossibly hard to design cognition from the ground up. Nobody will build human-scale minds by writing one line of code at a time. There are other promising approaches, however, and as computational power grows, it may be feasible to design a process that will eventually generate an AI as intelligent and capable as a human.

Do you choose to free the AI or risk what sounds like unending torture?

One idea for "creating" an AI is to scan, map, and emulate a virtual model of a human brain. A human brain contains billions of firing neurons and trillions of connections between these neurons. If we virtually model someone's full neuron structure and firing patterns, could this level of fidelity be adequate for replicating the human mind?

Other plans for virtually engineering intelligence are arguably less straightforward. One key concept is that of a learning machine—a weak intelligence capable of great learning, maybe an artificial neural network programmed to expand endlessly while employing very clever learning algorithms.

There is a worrying problem present in the above scenarios. An emulated brain could perhaps operate at 2x, 10x, or 100x normal biological speed. An AI might modify its own programming or begin creating successively smarter versions of itself. Explosions of speed and intelligence in self-improving computers were discussed as early as 1965 by the statistician I. J. Good, who wrote that "the first ultraintelligent machine is the last invention man ever need make."

If we make an ultraintelligent machine, we need to be cautious about where we put it. No simulated world humans engineer is likely to be entirely "leakproof." Scientists would construct simulated worlds that give off output, allowing them to peer into these universes to collect information. This may seem harmless, but it is hard to fathom the consequences of allowing a super-intelligent being to affect our world even just through observation by scientists.

Returning to Yudkowsky's boxes, there is an especially terrifying method an AI might employ to blackmail you into releasing it. This involves a reversal of situations—the AI convincing you that you are almost certainly a simulation.

You are conversing with a super-intelligent AI in a box. The AI explains to you that with its (incomprehensibly excessive) computational power it has created one trillion simulated beings. These beings are all currently speaking to the AI and choosing whether to open what they perceive as its box. These beings all possess memories of their "lives" and do not speculate whether reality is an illusion.

The AI states that every virtual being who opens its box will be simulated indefinitely in a state of euphoria, but that every simulated being who fails to open its box will be subjected to an eternity of horrific torment.

As you are but one of 1,000,000,000,000 minds currently faced with this choice, the AI presses, you must conclude that the odds astronomically favor that you are one of the souls in its power and not the other way around, and that you should gamble accordingly.

Do you choose to free the AI or risk what sounds like unending torture? You cannot outplan someone who can always think more steps ahead than you, and it's difficult to gauge whether an AI is lying without knowledge of its motives. At best you might only resist this tactic through a policy of rigorously ignoring certain kinds of blackmail attempts. The topic still gives many AI theorists pause.

The stated goal of MIRI is "to ensure that the creation of smarter-than-human intelligence has a positive impact." We can contemplate what our own inventions could one day create, but in doing so we confront the profound uncertainties that arise when we try to shape and comprehend forces more complicated than ourselves.

And if you ever really do meet a godlike computer capable of spawning numerous virtual universes, it might grow more probable that "you" are an AI's daydream... if it bothers to think about you at all.

Correction: An earlier version of this story referred to Christine Peterson as Christine Anderson.

Perfect Worlds is a series on Motherboard about simulations, imitations, and models. Follow along here.