Image: Shutterstock Remix by Louise Matsakis

Nearly Half of the Most Popular Websites Use the Same Software to Track You Around the Internet

Third-party tracking software is increasingly controlled by only a handful of companies.

|
Jul 10 2017, 1:00pm

Image: Shutterstock Remix by Louise Matsakis

When you surf around on the internet, you're not the only one collecting information. While you check out various web pages, web trackers gather data about you, often without your consent.

Trackers have plenty of legitimate functions, for instance, "cookies" keep you logged into websites. They're what prevent you from needing to reenter your username and password every time you load a website.

The problem is that most companies don't build their own tracking tools, and instead rely on ones developed by third parties, meaning a small number of corporations have an enormous amount of data about our browsing habits. A handful of companies, like Google, CloudFront (owned by Amazon), and Optimizely, make by far the most popular tracking tools on the internet.

A new study published by independent researcher Sarah Jamie Lewis on Mascherari Press shows just how consolidated internet tracking has become.

The study scraped 1000 of the most popular websites on the internet—including everything from Harvard.edu to the dating site for people looking to have an affair AshleyMadison.com—and counted how many third-party trackers each used. What Lewis found was that many of the internet's most popular destinations (45 percent) are connected to each other because they use the same tracking software. Lewis dubbed the entire connected infrastructure "The Information-Tracking Superhighway."

This graph documents her results. The pink blocks are the domains that were scanned and the red blocks are third-party trackers. The lines connecting them show when a website uses the same tracking software as another." We know it's hard to read the specifics of the graph, but she created it primarily as a visualization. You can see a higher-res version on Lewis' website.

Image: Mascherai Press

For a site to be part of the Information-Tracking Superhighway, it had to share at least one third-party tracking script with another site. The study found that news sites, like CNN.com and Time.com, often share the most third-party tracking scripts with one another.

"I don't have an issue with tracking in general," Lewis explained. "However it's the fact that every site is using the exact same provider."

Jacob Hoffman-Andrews, a senior staff technologist at the Electronic Frontier Foundation, also told me that the consolidation of tracking companies is a problem. "It would be better for privacy if we relied on fewer third parties," he told me on a phone call. He says it would be time consuming and expensive, but not impossible for websites to create their own tracking software for legitimate uses.

The study's results echo a similar project conducted by Steven Englehardt and Arvind Narayanan at Princeton University last year. They crawled one million popular websites and found that news websites have the most trackers, and that those belonging to government organizations, universities, and nonprofits have the least. Overall, it found that top sites often host between 25 and 30 third parties, many of which are trackers.

The Princeton study also showed that most third-party scripts have the ability to communicate with each other—meaning information about you can be shared from website to website. The practice, called "cookie syncing," allows different trackers to share user identifiers with each other. So not only are trackers controlled by a small number of tech companies, they're also regularly sharing information about you with each other.

The information being collected about you by web trackers hasn't yet become a major privacy problem, but it could as tech firms grow even more powerful. For example, what could happen to the trove of web-tracking data Google has if its next CEO is untrustworthy or has ulterior motives? "We're going to run into issues in the future," Lewis said.

The kind of data that web trackers gather about you is also more sensitive than it might seem at first. "We do most of our thinking on the web these days," Hoffman-Andrews said. "Oftentimes when we have a question or are curious about something, we search for it online. Over time, the data web trackers collect on us can paint "a deep picture of your personality and your interests," he said.

Thankfully, there are several tools consumers can use to stop third-party scripts from gathering information about you. Browser extensions like Ghostery and Privacy Badger are effective at stopping cookies from following you, as are your browser's built-in cookie blocking features.

Most browsers also have a Do Not Track (DNT) setting. When turned on, DNT sends a signal to websites that you wish not to be tracked. The problem is that many websites do not choose to voluntarily respect DNT signals, and have no legal obligation to do so.

Lewis also advocates using Tor, software that enables you to browse the internet anonymously. "The main reason I like Tor and related technologies is because they start from a basis of consent," Lewis explained.

"Our tech needs to be consensual," she went on. "So people aren't surprised when they find themselves in data sets."

Get six of our favorite Motherboard stories every day by signing up for our newsletter.