Facebook Is Trying to Figure Out How to Automatically Detect Mirror Selfies
Why even the best artificially intelligent computer vision can't detect mirror selfies.
In the picture above, you see some dude (it me) taking a picture of himself in his dingy bathroom mirror. A classic mirror selfie, if you will.
When even the most advanced artificially intelligent image recognition software sees that image, however, it sees something very different. It might see a man, a phone, a white checkered shirt—this would be a cutting-edge analysis that's only newly possible, by the way—but it wouldn't identify the image as a mirror selfie, or a selfie at all.
Solving the mirror selfie problem is one of the most important questions left when it comes to machine vision: Image recognition software can now fairly reliably describe what it's seeing, but it lacks the cultural knowledge or reasoning skills to put it into context.
Larry Zitnick, a research manager in Facebook's artificial intelligence division, told a crowd at New York City's LDV Vision Summit Tuesday that the company is actively working on how to make its computer vision tools be able to pass a "Turing test for reasoning and vision" by training them to not only recognize images, but to analyze them.
Zitnick said that, given a photo of a room like this one, Facebook could easily identify it as having a mirror in it. But give it the above mirror selfie, and a computer will come up with something completely different.
"If you show it a mirror, we can nail it," Zitnick said, pointing to an image like the one above. "To detect the mirror in [a mirror selfie], you need to have a much more deep understanding of the world. You have to understand how selfies are taken, so unfortunately it's really difficult."
Many of the presentations at LDV Vision Summit, in which companies like Facebook and Nvidia and talked about what they're working on and academic researchers discussed their latest work, discussed solving this analytical problem. Zitnick said that simply throwing more and more photos at its current image recognition tools won't ever help turn machine vision into something closer to true artificial intelligence.
"Compared to where we were in 1984 [when image recognition research started heating up], it's basically solved. But we can't think of AI as recognition. We need reasoning and learning," Zitnick said.
There's not an obvious path forward to teaching an artificially intelligent computer vision algorithm how to reason, but Zitnick says Facebook is currently trying what's known as "semi supervised learning," in which humans help guide the development of the vision algorithm but also allow it to make classifications on its own.
Facebook's is working on a type of computer vision called Common Objects in Context, or COCO, in which different objects in any given image are used to determine what's going on in the image as a whole.
General sentiment at the moment is that image recognition is technologically very impressive, but not all that useful to users.
"If I show it a picture of bananas and a bicycle, it can tell you there's a banana and a bicycle, which is incredible," Serge Belongie, a researcher at Cornell University said. "But we're at a stage where people are not asking the computer to tell them something they don't know about the image."
Zitnick says that the goal in the future is to be able to answer questions about specific photos—an artificially intelligent image recognition program should be able to look at a pepperoni pizza and tell you it's not vegetarian, or look at a photo of a person wearing glasses and tell you that he or she likely doesn't have perfect vision. How we'll get to that point is anyone's guess.
"What's so exciting to me is that we don't have a clear roadmap, we don't know how it's going to be solved, and we don't know how we're going to crack this learning problem," Zitnick said.