FYI.

This story is over 5 years old.

Tech

Google's Bio-Inspired AI Beat 49 Vintage Video Games

Machine learning levels up.
​Image: Google Deepmind

A team of Google researchers has crafted an artificial intelligence algorithm capable of not just learning how to play and beat vintage video games, but how to beat and play them as newborn "dumb" AI. Their system took on 49 games in all, and in some cases, it bested professional (human) video game testers.

The research also cuts to some fundamental questions: Can we understand the world without understanding its rules? Can we reasonably swap explicit laws—if this, then that, etc.—for a more statistics-based understanding of reality? For example, instead of knowing "gravity" as we have it, through observation, that masses fall, almost all of the time. This seems annoyingly trivial, but for artificial intelligence and machine learning, it's a very fundamental and unsettled question.

Advertisement

To make an algorithm or piece of software behave autonomously, the cheap (but effective!) way to do things is to give it some rules to follow. If you were to teach a system to play a video game, you would do likewise. If the cubes stack taller than this point, the game is lost; if you approach an enemy past this point, the enemy will attack; if you jump before this point, you will fall in the pit. Etc. Just a bunch of if-thens that coalesce into a decent AI. Don't fall in the pit; don't stack the blocks taller than this; don't touch the goomba.

As described in a new paper in Nature, the group of researchers, drawn in part from Google's Deepmind, has successfully taken the exact opposite approach. Rather than teach game-playing systems game rules, they programmed the systems to make minute observations. Again and again, the AIs played through a catalog of games, making statistical observations just based on pixels and scores. What pixels have a causal relationship with other pixels? Starting at this fine grain, the systems were able to eventually learn the games completely—and beat them.

The machine learning technique at work here is called Q-learning, a branch of a larger category known as reinforcement learning. This is a machine learning scheme in which a system progresses through some problem via a succession of evaluations: at each point, a decision is made and the algorithm receives some reward, or positive feedback, along with a new, updated state. The program weighs the value of each decision according to the immediate reward, but also the future state.

Advertisement

So, the system learns to think ahead as it learns how decisions influence the future. This has the helpful effect of encouraging experimentation, as the maximized decision is no longer a clear function of immediate cause and immediate effect. This is deep learning.

"We consider tasks in which the agent interacts with an environment through a sequence of observations, actions, and rewards," the study explains. "The goal of the agent is to select actions in a fashion that maximizes cumulative future reward. To achieve this,we developed a deep Q-network (DQN), which is able to combine reinforcement learning with a class of artificial neural network known as deep neural networks."

The agent/systems in question here had pixels and scores to go on, with no other knowledge of the games (or their rules). Gradually, they were able to learn how to better manipulate the pixels on the screen to achieve higher scores, not through single one-to-one causes and effects, but long strings of changing states, a continuum of causes and effects through time. Data sets grow (and grow) and patterns develop.

This seems obvious because it's more or less how we operate. And the deep learning at work here was in fact inspired by biological neural networking.

"The human brain repeatedly solves non-trivial inference problems as we go about our daily lives, interpreting high-dimensional sensory data to determine how best to control all the muscles of the body," Bernhard Schölkopf, an intelligent systems researcher at the Max Planck Institute, writes in an accompanying Nature perspective. "Simple supervised learning is clearly not the whole story, because we often learn without a 'supervisor' telling us the outputs of a hypothetical input–output function."

Advertisement

So, we as intelligent biological systems learn by reinforcement. Trial and error.

"The system picks output actions on the basis of its current estimate of [the accumulated future reward], thereby exploiting its knowledge of a game's reward structure, and intersperses the predicted best action with random actions to explore uncharted territory," Schölkopf explains. "The game then responds with the next game screen and a reward signal equal to the change in the game score."

The success of Q-learning here is twofold. For one, the algorithm was about as good at playing the games as a human games tester. Two, the algorithm proved to be tremendously adaptable, learning and dominating 49 different Atari 2600 games each with very different rules. It's actually even a bit deeper than that: the systems didn't just learn the games, they learned how to learn them in the first place. They started from nothing, in a sense. Newborn AI.

"In the early days of AI, beating a professional chess player was held by some to be the gold standard," Schölkopf offers. "This has now been achieved, and the target has shifted as we have grown to understand that other problems are much harder for computers, in particular problems involving high dimensionalities and noisy inputs. These are real-world problems, at which biological perception–action systems excel and machine learning outperforms conventional engineering methods."

Is their algorithm ready to take on the hardest Mario level in existence? Given enough time and computing resources, there's no reason why not.