FYI.

This story is over 5 years old.

Tech

DARPA Is Building a Google for the Dark Web and It Actually Works

Memex has already helped the government locate human traffickers and it's not even finished yet.

​You know how sometimes when you're searching for something specific on Google you just can't seem to narrow down the results enough? Either you end up with 1.2 million vaguely-related hits or else "No results containing all your search terms were found." Well, the Department of Defense has that problem too.

So the DoD's Defense Advanced Research Projects Agency has procured engineers and developers to create a better way of finding very specific information on the web, including the deep web. The program is called​ Memex, and it's already helpe​d the DoD to track down human traffickers on the dark web since it started last year.

Advertisement

"It's not building a better search engine, exactly. It's more like building a search engine factory," explained Juliana Freire, a computer science and engineering professor at New York University who was recently enlisted to work on the project.

"Our goals are to make it easy for people who are not computer scientists to search for very specific information on the web and to provide infrastructure that allows them to seamlessly go from searching to analysis," she told me.

Several different teams across the United States are working on different pieces of the software suite that will allow for very specific, deep dives into the web. Freire and her colleagues are concentrated on the system's crawlers: the bots that methodically browse the web and index pages. They're creating "focused crawlers" that can be programmed to identify specific sites.

"What focused crawlers attempt to do is to gather as many relevant pages as possible while minimizing the number of pages that they actually crawl. The performance is measured by what we call the 'harvest ratio:' the relevant pages divided by the number of irrelevant pages," Freire said.

Other groups are focused on information extraction—like identifying certain images or pulling facts from a document—or finding solutions for cataloging the deep web, Freire told me. But it's not just intended for security and defense. All of the software developed will be made open-source so everyone from data journalists to scientific researchers can use it to collect and analyze hard-to-get information online.

"Some people are already piloting some of the things we're developing, for example the folks in the New York City District Attorney's office. For them, this kind of technology is really useful when they're trying to build their cases," Freire said.

While the project was launched with a particular goal of helping the DoD crack down on human trafficking, watching the government secure better tools for shining bright lights into every corner of the web can make some people feel uneasy. But at least with the software being open-source, we'll have a better idea of what the government is capable of.