Researchers Tricked AI Into Doing Free Computations It Wasn't Trained to Do

Facial recognition systems have become ruthlessly efficient at picking people out of a crowd in recent years, and people are finding ways to thwart the artificial intelligence that powers them. Research has already shown that AI can be fooled into seeing something that’s not there, and now these algorithms can be hijacked and reprogrammed.

Despite recent advances, the technology behind facial recognition, a type of deep learning called machine vision, leaves much to be desired. Many computer vision algorithms are still at a point where they’re liable to make mistakes, such as mislabeling a turtle as a gun. These mistakes can be weaponized by subtly manipulating images so that they cause computers to “see” specific things—for example, a sticker on a sign can cause a self-driving car to think it’s actually a stop sign.

Videos by VICE

As detailed in a recent paper posted to arXiv, three Google Brain researchers have taken this type of malicious image manipulation (called adversarial examples) a step further and demonstrated that small changes to images can actually force a machine learning algorithm to do free computations for the attacker, even if it wasn’t originally trained to do these types of computations. This opens the door for the possibility of attackers being able to hijack our increasingly AI-driven smartphones by exposing them to subtly manipulated images.

Deep learning uses artificial neural networks—a type of computing architecture loosely modeled on the human brain—to teach machines how to recognize patterns by feeding them a lot of data. So, for example, if you wanted to teach a neural network how to recognize an image of a cat, you’d feed it tens of thousands, if not millions, of examples of cat pics so that the algorithm can determine general parameters that constitute “cat-ness”. If you then present the machine with an image that contains an object that falls within these parameters, and if this training stage was successful, it will determine that the image contains a cat with a high degree of certainty.

Strange things start to happen with machine vision neural nets when they’re fed certain pictures that would be meaningless to humans, however. In a picture that otherwise just looks like static, a machine vision algorithm might be very confident that the image contains a centipede or a leopard. Moreover, static can be overlaid on normal images in a way that’s imperceptible to humans, yet throws the machine vision algorithm through a loop, kind of like how smart TVs can be triggered to perform various tasks by audio that is inaudible to humans.

The Google Brain researchers took the concept of adversarial examples a step further with adversarial reprogramming: causing a machine vision algorithm to perform a task other than the one it was trained to perform.

The researchers demonstrated a relatively simple adversarial reprogramming attack that got a machine vision algorithm that was originally trained to recognize things like animals to count the number of squares in an image. To do this, they generated images that consisted of psychedelic static with a black grid in the middle. Some of the squares in this 4×4 black grid were randomly selected to be white. This was the adversarial image.

The researchers then mapped these adversarial images to image classifications from ImageNet, a massive database used to train machine learning algorithms. The mapping between the ImageNet classifications and the adversarial images was arbitrary and represented the number of white squares in the adversarial image. For example, if the adversarial image contained two white squares, this would return the value ‘Goldfish,’ while an image with 10 white squares would return the value ‘Ostrich.’

(a) Represents the mapping of the number of squares in an adversarial image (left column) to the ImageNet classifier (right column). (b) shows these adversarial squares being added to a static image to produce two different images, one representing a tigershark (four squares), the other representing an ostrich (10 squares). (c) shows that when these adversarial images are fed to an ImageNet algorithm, it returns values for tiger shark and ostrich, which tells the attacker it had counted 4 and 10 squares, respectively, even though the ImageNet algorithm had only been programmed to recognize animals in images. Image: arXiv

The scenario modeled by the researchers imagined that an attacker knows the parameters of a machine vision network trained to do a specific task and then reprograms the network to do free computations on a task it wasn’t originally trained to do. In this case, the neural network being attacked was trained as an ImageNet classifier trained to recognize animals.

To manipulate this ImageNet classifier to do free computations, the researchers embedded the black boxes with white squares in 100,000 ImageNet images and then allowed the machine vision network to proceed as usual. If the image contained a black box with nine squares, for example, it would report back that it saw a hen, and the researchers would know it had correctly counted nine squares since this was the ImageNet classification that mapped to the number nine.

The technique turned out to be remarkably effective. The algorithm counted squares correctly in over 99 percent of the 100,000 images, the authors wrote.

Although this was a simple example, the researchers argue that these types of adversarial reprogramming attacks could be much more sophisticated in the future.

“A variety of nefarious ends may be achievable if machine learning systems can be reprogrammed by a specially crafted input,” the researchers wrote. “For instance, as phones increasingly act as AI-driven digital assistants, the plausibility of reprogramming someone’s phone by exposing it to an adversarial image or audio file increases. As these digital assistants have access to a user’s email, calendar, social media accounts, and credit cards the consequences of this type of attack also grow larger.”