The dark web isn't really all that hard to index, if you know where to look.
For all the freaking out we do about the secrecy of the dark web (sites most often accessed with a tool called Tor), anyone with reasonable Google skills and enough motivation could, if he or she wanted to, find at least a handful of hidden service websites. So is the dark web actually all that hidden? Maybe not—and one researcher is in the process of making a map of the dark web, pulled from the normal internet.
The vast majority of hidden service websites (that is, the ones you need Tor to get to) have to walk a fine line. They want to remain somewhat hidden, but not so hidden that no one can actually find them. Therefore, there is evidence of how to find deep web sites littered all over the public, or "clear" internet.
Staffan Truvé, CEO of Recorded Future, a Sweden-based cyber threat research company, has tracked down where users talk about the dark web and direct each other to specific hidden sites in an attempt to visualize what it looks like. In fact, the company is persistently monitoring many parts of the dark web.
"Some people are over mystifying the dark web. It's not that magical. There's no sharp dividing line between the normal internet and the dark web," Truvé told me. "On sites like Pastebin, you can find lots of pointers to the darker sides of the web."
"We capture everything that's posted on paste sites"
Recorded Future scrapes everything posted on Pastebin and other "paste" sites, which are sites where plaintext can be posted anonymously. Pastebin is a popular place to find torrents, hacking data dumps, and, back in December, was the site where links to the leaked files from the Sony hacks ended up. It's also a fantastic place to find links to dark web sites. The company also monitors Twitter and forums all around the normal internet for links to the dark web.
Last week, Truvé gave a presentation on Recorded Future's work called " Visualizing the Underbelly of the Internet," in which he said the dark web is necessarily not all that secretive.
"If you're in this business to sell something, well, you need to advertise it somewhere," he said. "There are really a lot of shades of gray as to what I'd say is a dark web site. There's really obscure things on what we would consider 'normal' websites. I think the borderline is hacker forums and sites like Pastebin."
Many parts of the dark web aren't hard to find or visualize, but they are volatile. Truvé says that 10 percent of dark web sites posted on Pastebin are deleted within 48 hours. He says that the majority of those deleted sites point to illegal services on the dark web, which get used up, presumably by criminals, then deleted very quickly.
"We capture everything that's posted on paste sites, and then we go back in and check if the links are still active," he said.
By persistently monitoring it, he says the company can tell what the dark web is talking about.
"We analyze all the contents and do standard linguistic analysis to try to determine what's popular on the dark web," he said. "Really, this is very similar to Google trends. We can tell what people are querying for."
Mapping out the dark web doesn't necessarily make it any easier to crack down on what's there, which is a noted goal of the FBI and other law enforcement agencies. And Truvé's service isn't capturing the sites that aren't advertised anywhere on the normal internet.
He says it's hard to know how many sites he's missing out on.
"It kind of goes back to Donald Rumsfeld's 'unknown unknowns,'" he said. "There's not really a good way to quantify it. We see the rest as being a law enforcement thing. We try to do a quantified analysis of the dark web, as best as we can."
And, as for the encrypted parts of the dark web?
"We just ignore that content," he said. "We don't do code breaking."
So, yes, many parts of the dark web aren't all that secret—but there's still a bit of mystique left.