Why Google's Neural Networks Look Like They're on Acid
Robo-tripping with a neural network.
Recently, a mysterious photo appeared on Reddit showing a monstrous mutant: an iridescent, multi-headed, slug-like creature covered with melting animal faces. Soon, the image's true origins surfaced, in the form of a blog post by a Google research team. It turned out the otherworldly picture was, in fact, inhuman. It was the product of an artificial neural network—a computer brain—built to recognize images. And it looked like it was on drugs.
Many commenters on Reddit and Hacker News noticed immediately that the images produced by the neural network were strikingly similar to what one sees on psychedelic substances such as mushrooms or LSD. "The level of resemblance with a psychotropics trip is simply fascinating," wrote Hacker News commenter joeyspn. User henryl agreed: "I'll be the first to say it... It looks like an acid/shroom trip."
The media picked up on the same thing. Tech Times: "Google Takes Artificial Neural Networks On An Awesome Acid Trip." Tech Gen Mag: "Google's new 'Inceptionism' software dreams psychedelic art." PBS: "Left to Their Own Devices, Computers Create Trippy, Surrealist Art."
Is the psychedelic look of these images just a coincidence, or is there some sort of fundamental parallel between how Google's neural network created these images, and what our brains do when confronted with psychedelics?
Artificial neural networks (ANNs) are computers designed to simulate the human brain. They've existed since the early 50s, but over the last few years they've made amazing advancements in image recognition. The networks are made up of software-based "neurons," which communicate and alter their connection strengths to reflect the results of their calculations, just like real neurons. This adaptability is what makes ANNs special. It gives them the ability to learn.
Running these images through the higher levels over and over, trees transformed into floating mutant dogs and mountain ranges transformed into pagodas
Like human children, neural networks learn by taking in information about the world around them. This data is usually fed directly into the system by people. If a neural network designed to identify images sees 100 photos of dogs, it will begin to recognize a dog on its own. The more photos of dogs it sees, the better it will get. If the network sees a photo of a dog-shaped thing, a specific neuron in the network's uppermost layer will become highly activated, and the network will spit out its result: dog. With these skills, ANNs have become essential for recognizing features and faces in images, the kind of thing that Google's new photo service takes advantage of to create automated albums and films.
A convolutional neural network, the type Google used to create these strange images, consists of layers of neurons that send messages up a chain of command, interpreting information with more detail and abstraction as it moves upward, so that each layer only focuses on one small task. Because the network teaches itself, what exactly goes on in each of those layers is still largely a mystery. Google doesn't know what exact pathways information is taking or even entirely how the "division of labor" is broken down between the layers.
Google's experiment was intended to crack open these layers and see what was happening inside. The researchers declined to talk to us for this piece, but this is what we believe they did, based on similar experiments in the past. Instead of asking the network to identify images, they "turn[ed] [it] upside down," using a hill climbing algorithm, which starts from random noise and incrementally changes an image to find something that causes the neurons for a specific shape—be it banana, measuring cup, or dumbbell—to become highly active.
By examining these results, the researchers could measure how accurate the machine's knowledge was. The results weren't always exactly on point — for example, each image produced for "dumbbell" featured not just a metal weight, but also the muscular arm attached to it. That provided a valuable insight: the computer had probably only ever seen a dumbbell with an arm attached.
The most interesting images were produced when researchers let the machine interpret landscapes, like a field with a single tree in the foreground, or visual noise, like a fuzzy television screen. Researchers looked at which neurons were activated by the landscapes or noise, and then fed the resulting image back into the network, iterating and adjusting the image until the photo became an enhanced, magnified representation of what the computer "saw." The tree in the landscape became a pack of floating dogs, surrounded by towers and strange wheeled figures.
Extracting the images from the lower levels of the network, which detect stuff like lines and colors, the resultant images looked as if they were painted with thick, curving brush strokes in the style of a Van Gogh painting. Running these images through the higher levels, which recognize full images, like dogs, over and over, trees transformed into floating mutant dogs and mountain ranges transformed into pagodas.
These images were obviously weird, but why did they look so much like the visuals we see on psychedelics? To answer that, I first needed to look at how our brains recognize images. This process is very similar to how ANNs do image detection. In humans, visual information comes through the eye and travels down the optic nerve to the base of the visual cortex. There, our brains perform some basic tests: searching for edges, determining whether lines are vertical or horizontal, and looking for colors and hues. Once processed, this data is then passed up the command chain to more and more sophisticated processing units, where our brains can begin to determine if what we're looking at is an apple or a car.
The main difference between our visual processing and that of neural networks is the amount of feedback from different areas of the brain, says Melanie Mitchell, a professor of computer science at Portland State University, who has written a book on neural networks.
Google's neural network is "feed forward"—it's a one-way street where data can only travel upward through the layers. By contrast, our brains are always communicating in a million directions at once. Even when we've only seen basic edges and lines, our upper brain may begin to tell us "that might be a beach umbrella," based on our prior knowledge that umbrellas are usually next to sand and waves, for example. The final information that gets passed to our consciousness—what we see—is a composite of visual data and our upper brain's best interpretation of that data. This works perfectly until we encounter something that fools our brain, like an optical illusion.
Taking hallucinogenic drugs dramatically alters this finely-tuned process. "The normal ways that areas of the brain are connected and communicate break down," says Frederick Barrett, a cognitive neuroscientist who studies psychedelics in Johns Hopkins Behavioral Pharmacology department. As the brain tries out different and more connections, the frontal cortex and other controlling areas of the brain, which regularly mediate the firehose of sensory information that comes from the outside, is weakened, leaving it up to other parts of the brain to interpret the deluge of information we receive from our eyes. Overwhelmed with data, the less advanced layers of the brain are forced to make their best guesses about an image.
Anyone who's ever tripped knows that there are a certain set of prototypic psychedelic visuals that are common to most experiences: think of the work of Alex Grey or the popular 1970s pattern paisley. Barrett says there's a decent explanation for this commonality: it hinges on serotonin 2A receptors, which are thought to be one of the primary receptors on which psychedelic drugs work. We have a great number of 2A receptors in the visual cortex. Since the receptors exist low in the processing chain, the information they feed us is largely lines, shapes and colors. It's up to the rest of our brain to interpret this information, but when we're on drugs, our usually strict higher functioning areas are not at their peak capacity. Thus, we end up seeing kaleidoscopic, fractal images as an overlay on surfaces. These visuals are coming directly from the base of the brain. In some ways, it's like peeking into the black box of our mind, seeing the puzzle pieces that put our regular perception together.
"[Google's images are] very much something that you'd imagine you'd get with psychedelics or during hallucinations," says Karl Friston, a professor of neuroscience at University College London, who helped invent an important brain imaging protocol. "And that's perfectly sensible. [During a psychedelic experience] you are free to explore all the sorts of internal high-level hypotheses or predictions about what might have caused sensory input." He adds, "[This parallel exists] because the objectives of the brain and the objectives of the Google researchers are the same, basically—to recognize stuff and then act in the most effective way."
"What [Google] are talking about with neural networks approximates well what happens in the brain and what we know about the visual system," Barett agrees. But he thinks we're still far from creating a neural network that accurately models the brain. "The complexity of the brain is such that I'm not sure if you can model [it] with artificial neural networks. I just don't know if we've gotten there yet, or even anywhere close," he says.
"Use them with care, and use them with respect as to the transformations they can achieve, and you have an extraordinary research tool," wrote Alexander Shulgin, the "Godfather of Ecstasy," in his book Pihkal. He was talking about drugs, and the human mind, possibly the most complex and dangerous tool that's ever existed. People, for millennia, have turned their own minds upside down with these substances, trying to get a better look at what we've learned and what we're still learning. Google's artificial brains remind us that there's plenty of research left to be done.
Correction: An earlier version of this story referred to Google's neural network as "convoluted;" the correct word is "convolutional."