FYI.

This story is over 5 years old.

Tech

Nine Out of Ten of the Internet’s Top Websites Are Leaking Your Data

New research has quantified the "privacy compromising mechanisms" on the one million most-visited websites, and they're everywhere. Guess who's responsible for most of them?

The vast majority of websites you visit are sending your data to third-party sources, usually without your permission or knowledge. That's not exactly breaking news, but the sheer scale and ubiquity of that leakage might be.

Tim Libert, a privacy researcher with the University of Pennsylvania, has published new peer-reviewed research that sought to quantify all the "privacy compromising mechanisms" on the one million most popular websites worldwide. His conclusion? "Findings indicate that nearly 9 in 10 websites leak user data to parties of which the user is likely unaware."

Advertisement

Libert used his own open source software called webXray—the same program he's used in the past to analyze trackers installed on health and porn websites—and he found that not only were most siphoning user data, they were sharing it all over the place.

"Sites that leak user data contact an average of nine external domains," he wrote in the new paper, published in the International Journal of Communication, "indicating that users may be tracked by multiple entities in tandem."

In other words, when you visit a website—say, Airbnb.com, Yahoo.com, or Motherboard.tv—that site will likely forward your user data to nine other, outside websites. These include Google (through its analytics software, which is installed on a colossal number of sites across the web—46 percent, per Libert's research), Facebook, and Wordpress.

Furthermore, Libert found that "more than 6 in 10 websites spawn third-party cookies; and more than 8 in 10 websites load Javascript code from external parties onto users' computers."

Tracking, Libert told me, is "utterly endemic."

"There is one web users sees in their browsers, but there is a much larger hidden web that is looking back at them," he told me. "I always find it funny when old TV shows will have a gag where somebody on the screen can 'see' into your living room—it's obviously silly with old technology, but that's really how the web works! For every two eyes looking at a screen there are probably ten or more looking back at them."

Advertisement

So what does that mean for you, the average daily internet browser?

"If you visit any of the top one million sites there is a 90 percent chance largely hidden parties will get information about your browsing," Libert told me. "Most troubling is that if you use your browser setting to say 'Do Not Track' me, the explicitly stated policy of nearly all the companies is to flat-out ignore you," he told me, referring to your chosen browser setting.

Libert's analysis finds that, unsurprisingly, one company is doing the lion's share of the tracking.

"The worst perpetrator is Google, which tracks people on nearly 80 percent of sites, and does not respect DNT signals," he said.

A Google spokesperson declined to comment, but pointed to its Terms of Service, which states it is against company policy for Google Analytics to send personally identifiable information of any kind. He also noted Analytics' privacy controls and data sharing settings, and the opt out browser extension.

Libert says that's misleading.

"The company acts as though users have a choice to follow special instructions to opt-out of Analytics, but this is absurdly disingenuous as all Google needs to do is check a simple, and universally available, browser setting," he said. "It is even more comical when you consider most people never get any notification Google is tracking them. Of course this goes for Facebook and pretty much all other Internet companies as well." There are exceptions, however.

Advertisement

"That said, the positive takeaway is that Twitter is taking a real lead in the industry by respecting DNT and deserves some serious credit," he said. "If all companies were acting like Twitter I wouldn't have much to complain about in this regard."

Mass tracking has implications for surveillance, too, Libert said. "The other take-away goes back to the Snowden revelations on the NSA spying programs. What we really learned wasn't that the NSA was spying on people—it's that the NSA was spying on companies that were spying on people—which is way easier as there are a handful of companies (like those in the PRISM slide) who need to have their arms twisted into cooperating."

"Even if the companies say there is no PRISM program and they don't collaborate with the NSA," he continued, "that doesn't change the fact that they have created a one-stop shop for every intelligence agency on earth. What they've done is make population-level surveillance cost-effective in a way that a military contractor would never do."

And if you want to keep from being tracked? As it stands, the options are limited.

"Tor is pretty much your best bet," Libert said, "with the provision you don't log into any accounts (e.g. Facebook, Gmail, etc.) as then you have identified yourself and may be subject to tracking."