FYI.

This story is over 5 years old.

Tech

The Computer Program That Would Teach Robots How to See

Levan is creating a "visual encyclopedia of everything" to help robots learn how to see.

Robots and artificial intelligence are really great at looking up information quickly—remember when Watson smacked down a bunch of Jeopardy champs? But they're not so great at seeing something and knowing what they're looking at. But a new computer program is trying to change that—its programmers hope to create an artificial intelligence platform that can look at an object and subsequently teach itself everything there is to know about it.

Advertisement

Like I just said, humans (especially ones with a bit of expertise), outrun machines when it comes to knowing what we're looking at—it's one reason why subreddits like r/Whatsthisbug are so popular. There's just no good way to take a photo and ask a computer to figure out, specifically, what the heck this thing is. But the University of Washington's Levan program is systematically compiling a visual encyclopedia of, quite literally, everything, so that robots will be able to see one day.

Levan is able to scan the web for a concept, say, "boat," and will make a visual list of every possible variant it can. The program can distinguish between a scout boat, a pirate boat, a tug boat, a sunken boat, a fishing boat, a sauce boat—you get the idea. The goal is to eventually create a program that can literally learn "everything about anything," so that when you see a sweet car on the road, you can snap a picture of it, plug it into a forthcoming app, and get told that it's a 1964 Corvette.

Some examples of different types of "boats" that Levan found. Image: Levan

"We want it to search for things that we, as humans, don't know how to name—we might see a certain type of chair, but we don't know what kind of chair it is," Ali Farhadi, one of the researchers who is programming Levan, told me. "Well, you can take a picture of it, and Levan will immediately recognize it as a peacock chair, and can tell you who designed it."

Farhadi and his colleague, Santosh Divvala, released a beta version of Levan today that. Right now, anyone can plug in a "concept"—basically any noun, verb, or adjective in the English language—and Levan will crawl the web looking for variants and "subcategories" of that concept. Once Levan becomes more efficient, it'll essentially trawl the internet on its own, teaching itself as it goes.

Advertisement

To "learn," Levan analyzes both the context a photo is posted in (it'll crawl through Google Books, Google image search, and several other databases to pull out different "variations" on any word) and the content of the image itself. Its algorithm can tell that, say, a picture of two people doing the tango is different than one of two people swing dancing. It'll add both into its database, and, if you show it a picture of one or the other in the future, it'll be able to determine which type of dance it is.

Levan analyzes photos to get a good idea of what certain concepts look like. Image: Ali Farhadi

Admittedly, there are a lot of bugs. If you play around with Levan for a few minutes, you'll notice lots of duplicates and lots of things that just don't seem to fit.

"This is version 1.0," Divvala said. "By version six or seven, it'll be able to do a lot more things."

The idea is to soon release Levan as a smartphone app, so, in the near term, you can start snapping pictures of things, and Levan can start telling you what they are. But, in the future, Farhadi says that the program's underlying structure has major implications for robot vision. Eventually, a Levan-like program could be embedded into a camera-equipped robot, and it'll be able to go about its day-to-day robot life, seeing a car as a car or a person as a person instead of as simply a solid object to be avoided (the way driverless cars often do now, for instance).

And, at that point, we're that much closer to having robots who can act just like us.