There is an “overwhelming” presence of content related to illegal activities on easily-accessible dark web sites, according to new research.
“The results suggest that the most common uses for websites on Tor hidden services are criminal, including drugs, illicit finances and pornography involving violence, children and animals,” Daniel Moore and Thomas Rid, both of King’s College London, write in Cryptopolitik and the Darknet, an essay and research project looking into the relationship between privacy and security. The piece will appear in the February/March issue of Survival, a journal from the International Institute for Strategic Studies (IISS).
As part of their research, the pair scraped Tor hidden services and analysed their content to build up a picture of what sort of services are hosted on the so-called dark web.
“The database analysis brought to light the overwhelming presence of illicit content on the Tor darknet,” they write.
Over a five week period, the pair found 5,205 live websites, 2,723 of which they were able to classify by content. Of those, 1,547 hosted illicit material—around 57 percent.
Moore told Motherboard in a phone interview that one of the purposes of the research “is to introduce some new perspective that is somewhat moderate” to the debate on encryption.
"We wanted to introduce a more nuanced discussion"
Encryption is often a polemic issue. On one side of the debate government officials call for sweeping powers to decrypt communications in order to catch criminals, despite the risks to everyone else’s data; on the other side, activists may support encryption tools without acknowledging their potential for abuse.
“We wanted to introduce a more nuanced discussion, and to stake out the middle ground between those two extremes, because obviously they can't both be right,” Rid added.
Moore and Rid's methodology relied on a Python-based web crawler; that is, a script that cycled through known hidden services, found links to other dark web sites, ripped their content, and then classified it into different categories.
The crawler started by going through lists of hidden services on Onion City and Ahmia, two popular dark web “search engines.” The script accessed sites multiple times in case they were offline at first—a common problem as the up-time of hidden services can be fairly temperamental—and harvested each site for up to 100 different pages. If any of the content was in a non-English language, it was automatically translated by Google's online service.
The classification was based on an algorithm that had been taught to split the content into various themes. At first, Moore manually categorised 600 documents under headings such as “drugs”, “social”, “financial”, and a number of others. If a page didn't display any content at all, or only had under 50 words, it was placed into the “none” category.
Once the system had been exposed to Moore's classification, it carried out the process automatically for the remainder of the dataset. After the classification was complete, Moore randomly selected 50 sites in order to double-check they had been sorted by the algorithm correctly. “Aside for two cases in which the social category was assigned to forums focused on exchanging information on the manufacturing of narcotics, the categories assigned were correct,” he writes.
"The line between utopia and dystopia can be disturbingly thin."
Moore described the methodology to Motherboard as “measured and cautious.”
“We don't make any statements about the entire contents of Tor,” Moore said. “We just looked at what is the reasonable offering of hidden services to most users.” He added that it was of course possible there are additional hidden services that they did not come across.
“We went for what a user can actually see and interact with,” Moore said.
The paper concludes that. "Tor's ugly example should loom large in technology debates. Refusing to confront tough, inevitable political choices is simply irresponsible. The line between utopia and dystopia can be disturbingly thin."
Kate Krauss, spokesperson for the Tor Project, told Motherboard in an email, “The researchers seem to make conclusory statements about the value of onion services that lie outside the scope of their research results. Onion services are a tool with unique security properties used for a wide range of purposes: They are self authenticated, end-to-end encrypted, and offer NAT punching and the advantage of a limited surface area.”
At the time of writing, around 35,000 unique .onion addresses exist, according to the Tor Project's own figures. These numbers are reported by hidden-service directories; special nodes of the Tor network that can map the usage of hidden services. However, the figures do not necessarily equate to websites running as hidden services: they may also include things like XMPP servers, used for chatting, for example.
The Tor Project declined to answer questions about the methodology used by Moore and Rid.
In all, this latest research provides a deeper, empirical basis for discussion around Tor hidden services, and perhaps encryption more generally.
Rid hopes that the research “will make it more difficult for anybody to just make these wholesale, rather disappointing statements about encryption. We're just beyond that point.”