Image: donvictorio/Shutterstock

The UK Wants to Store Every Citizen's Browsing Data. I Tried Collecting My Own

The draft Investigatory Powers Bill would store "internet connection records" for a year.

|
Nov 19 2015, 4:40pm

Image: donvictorio/Shutterstock

My digital life sits in one 8.5 MB folder. It contains reams of logs, detailing the connections between my computer and the internet. Every website I've visited, every online service I've used. It shows when I logged in, how long I stayed for, and when I moved on—whether I was researching articles, talking to friends, shopping, or feeding my insatiable YouTube habit. Thousands of lines of data, squeezed into dozens of Excel spreadsheets.

In an attempt to simulate the powers proposed under the UK's new draft surveillance law, I recorded my internet connection history over just four days. The proposals, if passed into law, will force internet service providers to retain data including the web browsing history of every UK citizen for 12 months.

***

The draft Investigatory Powers Bill (IPB) was announced earlier this month, and follows the scrapping of the draft Communications Data Bill, or "Snooper's Charter," another piece of highly controversial surveillance legislation. The point of the Bill, according to UK Home Secretary Theresa May, is to fight terrorism, organised crime, online fraudsters, and child pornographers.

The amount of data that will be collected under the Bill has led to widespread criticism among commentators and privacy campaigners.

The suggestion to try logging my own activity came from Nicholas Weaver, a senior researcher at the International Computer Science Institute at the University of California, Berkeley. He offered to analyse the data I collected, and see what he could figure out about my online activity based on just a small snapshot.

My motivation for doing this experiment was to get a better idea of what the data collection proposed under the IPB would actually include: how much data is going to be stored, and how detailed a picture of my life could it reveal?

A sample of the author's logs. Image: Joseph Cox

First, I installed software to capture my internet connections; I used Bro, an open-source network monitoring tool. Bro is typically deployed as a defensive tool to detect suspicious traffic on a company's network, but it worked perfectly well in this context. I made sure I was only logging the activity of my own computer, not anyone else's in my office, and Bro started dumping the data into files stored on my device.

Under the IPB, ISPs will collect the so-called "front page" of a website. This means that they will retain records of which site was visited, but not the specific pages.

It's unclear how exactly ISPs would collect browsing histories. For the sake of this experiment, Weaver suggested using Bro to save the DNS and connection logs. The connection logs, Weaver explained, detail when your IP address contacts an IP address of another computer or server, and the time, port, and amount of data. Combined with the DNS log—the Domain Name System (DNS) being the way that domain names are translated into IP addresses—Weaver would be able to reconstruct my browsing activity.

With Bro humming away in the background, I turned off my VPN and got to work. (A VPN, or virtual private network, reroutes your internet activity, obfuscating your original IP address.)

Soon Weaver came across things that might raise eyebrows

Just from pulling small nuggets from the data I provided him, Weaver was able to see that I had browsed Amazon and a pharmaceuticals site, and searched for information about the institute he works for.

Weaver could tell I had logged onto a site to do with Chinese censorship, and then hopped between dozens of police force and UK government websites. This was all research for articles—including the multiple visits to cam girl sites (this was for work, I promise).

Soon Weaver came across things that might raise eyebrows. He saw I visited a site where you can download open-source intelligence tools, the homepage for anonymous operating system Whonix, and the blog of operational security expert "the Grugq."

"Anonymity is suspicious," Weaver said. Several sites peddling illegal goods also popped up, including ones that sell drugs and stolen credit card data. Twitter, more shopping, and a myriad of technology, politics, and general news sites rounded off Weaver's light skimming of sites I had visited.

A sample of the author's logs, showing a visit to a site on Chinese censorship. Image: Joseph Cox

As well as browsing history, other data will be swept up by ISPs as part of what the Bill calls "internet connection records."

"'Internet Connection Records' are described in the notes to the Bill as 'a record of the internet services a specific device has connected to, such as a website or instant messaging service,'" Paul Bernal, a lecturer in law at the University of East Anglia (UEA) with a focus on surveillance legislation, told Motherboard in an email.

That would include connections to services that aren't accessed through a browser. Indeed, Weaver was able to discover that, as well as my public, work-related Riseup email account, I was connecting to another service for personal emails.

Bernal said internet connection records would also encompass other communications services, such as WhatsApp. Weaver found the server I was connecting to for my instant messaging chats.

I only ran our experiment on my computer, and didn't include my phone or any other internet-connected devices. An average user, with, say, a phone, tablet, laptop and possibly a games console, would generate many more logs than I did. However, the definition of an "internet connection record" is so hazy that the full extent of what it could include is unclear.

The instant messenger used by the author. Image: Joseph Cox

"The whole idea of an internet connection record does not exist as far as internet service providers are concerned," James Blessing, chair of the Internet Service Providers' Association (ISPA) told the UK government's Science and Technology Committee earlier this month. "It is not clear from the Bill what constitutes a connection record."

Computers don't just send messages out to the internet when a user tells them to. System updates, browser plugins, desktop widgets for the weather: all those little things that run in the background on a computer are still communicating across the internet.

"The truth is that an internet connection record has been defined so vaguely and broadly, it could mean any of those things and a lot more," Harmit Kambo, director of campaigns and development at Privacy International, told Motherboard over the phone.

"It's nothing like the comparison that Theresa May drew, of it being like an itemised phone bill. It is a much more, three-dimensional picture of you, based on not just what websites you visited, but all kinds of connections your devices are making across the internet."

The Home Office declined repeated requests from Motherboard to provide a concrete definition of an "internet connection record," pointing to the text of the draft Investigatory Powers Bill.

Regardless, even just with internet browsing histories and logs of connections between a user's device and a communications service, a colossal amount of data is collected.

"Databases of these records would seem to be perfect targets for hackers of many kinds."

In response, ISPs are going to need to establish systems for storing this data securely. If the IPB was to become law, after collecting the data, ISPs will "have to keep it safe, have to make sure that only the correct people within your organization are actually able to access it, with the right credentials, at the right moment, to do the right thing," said Matthew Hare, CEO of UK ISP Gigaclear. "It is a non-trivial security task."

"Databases of these records would seem to be perfect targets for hackers of many kinds," Bernal warned.

Under the Bill, UK police and security services can't access the data stored by ISPs unless they're trying to find out particular things. "Access is set out as limited to establishing who is the sender of a communication, what they are connecting to, and whether they are accessing illegal content—the example given is child abuse images," Bernal explained.

But "the ISPs have to hold the internet connection record data, then allow the police, MI5, MI6 or GCHQ to access it in order to establish these things," Bernal added. "How they actually establish these things without having access to all the relevant records is not made clear."

Questions remain about how ISPs will carry out the powers under the IPB, how much it will cost, and whether it will even be effective at curtailing terrorism or serious crime.

***

In public addresses, Theresa May has attempted to reassure the country that data collected under the IPB will be of a balanced, noninvasive nature.

"It is not mass surveillance," May said in the speech announcing the draft Bill earlier this month.

But a wealth of potentially sensitive information related to UK citizens will be swept up and stored if the Bill becomes law.

Strictly speaking, the data collected under the IPB is metadata: the who, what, when of a communication, rather than the content itself. Metadata is regularly shrugged off as trivial, but my experiment with Weaver showed that, when collected en masse, it's an effective tool for building up an intricate picture of somebody's life.

People are indifferent around metadata collection, "because they don't realise just how much information is in there," Weaver said. "The metadata is the message."