After Snowden, some words seem harder to Google: Researchers found a "significant" fall in search traffic on 'high government trouble'-rated search terms.
Photo by Spyros Papaspyropoulos
How risky is it to use the words "bomb," "plague," or "gun" online? That was a question we posed, tongue in cheek, with a web toy we built last year called Hello NSA. It offers suggested tweets that include terms drawn from a list of watchwords that analysts at the Dept. of Homeland Security have been instructed to search for on social media since 2011. "Stop holding my love hostage," one of the tweets read. "My emotions are like a tornado of fundamentalist wildfire."
It was silly, but it was also imagined as an absurdist response to the absurdist ways that dragnet surveillance of the public and non-public Internet jars with our ideas of freedom of speech and privacy.
And yet, after reading the mounting pile of NSA PowerPoints, are all of us as comfortable as we used to be Googling for a word like "anthrax," even if we were simply looking up our favorite thrash metal band? Maybe not.
According to a new study of Google search trends, searches for terms deemed to be sensitive to government or privacy concerns have dropped "significantly" in the months since Edward Snowden's revelations in July.
"It seemed very possible that we would see no effect," MIT economist Catherine Tucker and digital privacy advocate Alex Marthews write. "However, we do in fact see an overall roughly 2.2 percentage point fall in search traffic on 'high government trouble'-rated search terms."
Tucker and Marthews asked nearly 6,000 people to rate the sensitivity of a pile of keywords—including those on the DHS social media watchword list—based on whether the word would "get them into trouble" or "embarrass" them with their family, their close friends, or with the US government.
Marthews, the head of the the Cambridge-based digital advocacy group Digital Fourth, had predicted an effect on search behavior. Tucker, a professor at MIT's Sloan School of Management who specializes in data issues, had not expected to see an effect.
But by analyzing Google's publicly-available search data, they noticed a general pattern: even as searches for less sensitive words appeared to rise, searches for the most suspicious words fell.
"This is the first academic empirical evidence of a chilling effect on users’ willingness to enter search terms that raters thought would get you into trouble with the US government," Tucker wrote in an email.
Researchers found a rise in search terms with "low" government sensitivity and a decline in terms with "high" sensitivity
While searches for government-sensitive terms dropped in the US, the data indicates only a minor drop in US-based "privacy-related" searches--the sort that might get you in trouble with friends or family (see examples below).
Outside of the US—the researchers also looked at searches from the US's top ten trading partners—they found that Google users tended to search less both on government-sensitive search terms like “anthrax” (those searches dropped by 1.1 percentage points) but also on personally-sensitive terms like “eating disorder” (those searches saw a nearly 1.6% decline), even as less sensitive terms showed a general rise. The trend was led by searches in the United Kingdom and Canada, and, to a lesser extent, by France, Mexico, Japan, Brazil and China.
There were some surprises too: After the PRISM revelations, Google users in Saudi Arabia searched less on words considered to be privacy-insensitive but searched slightly more for privacy-sensitive terms. Germany, meanwhile, shows a pattern of rising search traffic for both "low-privacy" and "high-privacy" terms, as does South Korea.
The authors note a number of limitations in their study, and speculate that one reason for the drop in sensitive Google search terms could simply be international users switching to other search engines in reaction to the surveillance revelations.
"Maybe they will also switch to non-US search engines," they note. "Maybe there will be broader knock-on effects on sales of Google products such as Android phones. If international users are reacting strongly, it may damage the exports of US tech companies."
In the data industry, reports the Times, government surveillance is now considered "the new normal." Projections by the Information Technology and Innovation Foundation put lost revenue for the United States cloud computing industry at $35 billion by 2016. Under a worst case scenario, the losses could be as high as $180 billion, or 25 percent of industry revenue, according to industry analyst Forrester Research.
But immediate economic losses are only one side-effect of once-secret surveillance programs. In the days following the first revelations by Edward Snowden, the ACLU filed suit again against the government over the "chilling effect" of mass surveillance on Americans. In October, a group of scholars and journalists from Columbia and MIT wrote a letter to a Presidential advisory group warning of the effects on journalists and documenting particular instances of harm. "The NSA has made private communication essentially impossible," it wrote.
Besides the President's advisory panel, another group in the executive branch, the Privacy and Civil Liberties Oversight Board (PCLOB), is charged with independently reviewing anti-terrorism efforts to see if they comply with established law and remain compatible with “liberty concerns.” In a 238-page report (pdf) issued in January, the group bluntly condemned the NSA's bulk telephony call records program as illegal and mostly ineffective, urging the President to shut it down. Now the PCLOB is turning its attention to the PRISM program.
In an email, Marthews wrote that his and Prof. Tucker's research might be "particularly interesting to people engaged in the debate around government surveillance on both sides, and to lawyers interested in First Amendment chilling effects. We don't expect people to agree on whether that's good or bad, but our article shows that this chill is definitely happening."
In general, search terms with "high" privacy sensitivity declined, even as "low" privacy sensitive terms appeared to rise. Germany and South Korea were exceptions.
While the study offers an early portrait of one potential chilling effect of Edward Snowden's revelations, it carries limitations. The big data analysis alone doesn't prove that news about the NSA caused Google users to censor their searches, merely shows evidence that it might have. "At such an aggregate level," the researchers caution in their paper, "it is hard to straightforwardly assert that the changes we observe were attributable to the surveillance (and particularly the PRISM) revelations."
While the authors accounted for other factors that might explain the changes in searching, including weather and other news stories, they could not account for all outside factors. It's also possible, they caution, that the detected effect could have been caused by tweaks made by Google itself. But the company doesn't reveal how it calculates its data; it also doesn't offer actual numbers of searches, but rather rates them on a scale based on how often a term is searched for relative to the total number of searches.
"This is the fundamental problem with using these data; the process of generating the data is secret," said Gary King, director of Harvard's Institute for Quantitative Social Science. King has become familiar with the limitations of big data as he's examined, among other things, how censorship works on China's social media. The secrecy surrounding web metrics, he said, "has to be the case for commercial reasons, I suppose, but science does not exist without sharing information."
Google Trends has been used widely, in studies for financial and health forecasting. But claims that Google Trends can predict the flu, for instance, have been largely deflated, the result of "big data hubris." There are more playful uses of search data too: Last year, when Princeton researchers used Trends data to foretell the demise of Facebook, Facebook researchers responded with their own Trends analysis that found that Princeton was on the decline.
Judges rated sensitive search terms, partly drawn from a Dept. of Homeland Security watchlist, according to the level of embarassment they would cause users
"What would be the perception if I Googled ‘nuclear blast,’ ‘bomb shelters,’ ‘radiation,’ ‘secret plans,’ ‘weaponry,’ and so on? And are librarians required to report requests for materials about fallout and national emergencies and so on?”
But the study's findings dovetail with another report published in January by PEN America. In a survey (pdf) of 528 writers, it found that after the NSA revelations, writers are more fearful of government surveillance than the general populace—enough so that they are turning down book deals and speaking opportunities and shying away from writing about certain topics, "including military affairs, the Middle East North Africa region, mass incarceration, drug policies, pornography … the study of certain languages, and criticism of the US government.” Sixteen percent of writers polled by PEN said they won’t do certain Google searches in case it piques the government's interest; and one in four say they regularly self-censor in email and on the phone.
PEN included anonymous testimonies from some of the writers it surveyed. Before reports on NSA surveillance. one writer told PEN that she had considered researching a book about civil defense preparedness during the Cold War. "What would be the perception if I Googled ‘nuclear blast,’ ‘bomb shelters,’ ‘radiation,’ ‘secret plans,’ ‘weaponry,’ and so on? And are librarians required to report requests for materials about fallout and national emergencies and so on? I don’t know.”
How Your Searches Could Be Searched
After the passage of the Patriot Act, the FBI began sending more national security letters to librarians, companies and Internet service providers. These subpoanas are meant to collect data about individuals suspected of connections to terrorism or spying, without their knowing about it and with gag orders often imposed on recipients of the letters.
Aside from asking Google, for instance, for your search history directly, law enforcement can also get a warrant for a wiretap on your internet connection, or they could search your computer physically. The NSA has a broader set of tools at its disposal, as Edward Snowden's revelations showed: through the PRISM program, under section 702 of the FISA Act, the agency can get court orders to collect data from Internet companies, but it can also gather Internet data of anyone, including Americans, found to be two steps away from a "legimiate foreign intelligence target."
From January to June of 2013, Google reported that it received between 0 and 999 FISA orders, for the targeting of between 9000 and 9999 user accounts (even after a legal settlement to disclose in January, the companies are not permitted to detail exact numbers). During the same period, Yahoo received between 0 and 999 such orders that targeted between 30,000 and 30,999 accounts. How many of these are PRISM orders is unclear, as the data released by the Internet companies is lumped in with Section 107 orders.
Revelations by Edward Snowden also detailed other more surreptious ways the NSA and the GHCQ gather data, including search terms, from web users. The GHCQ's TEMPORA operation tapped global Internet links. NSA engineers devised a program codenamed MUSCULAR that appeared to be capable of reverse-engineering personal data being ferried around the web by exploiting the connections between the data centers of Google and Yahoo. In the case of Google, as Ars Technica reported,
As of 2012, the NSA developed "defeat fingerprints" to scan the server-to-server communications that powered Google Adwords, Blogger, the BigTable database that powers Google Drive and other applications, and the TeraGoogle search index interface. These fingerprints allowed the NSA to scan Google internal traffic and identify elements associated with the usage of specific individuals or for searches and other behavior around a particular subject of interest (like, say, "pressure cooker bomb").
This near real-time data could be fed into a tool called XKeyscore to create social graphs of individuals' lives in order to identify their associates and search for foreign terrorists, as well as to assemble a person's geolocations over time and other detailed personal information.
The companies later encrypted their internal connections, and made encryption default on all web searches. Since 2011, Google has offered further protection to user searches and other data at encrypted.google.com (an explanation of how that works can be found here). The Electronic Frontier Foundation also offers HTTPS Everywhere, a free plugin for web browsers that does something similar.
NSA headquarters in Fort Meade, MD. Photo: Flickr / mjb
Last July, journalist Michele Catalano claimed she and her husband had been visited by police officers who asked him, "Have you ever looked up how to make a pressure cooker bomb?" He had, he explained, out of sheer curiosity. The article went viral, but some questioned the details: could the government really have listened into his Google searches? It subsequently turned out that Catalano's husband's searches hadn't been seen by the police, the FBI, or the Dept. of Homeland Security, but rather by his co-workers, who became suspicious and called the police. While the story may have been "bogus" (it has since been removed from Medium, where it first appeared), the episode evidenced the way that Google searches, even innocent ones, can arouse suspicion.
The "government sensitive" words used in the study were drawn from a list crafted by the Department of Homeland Security beginning in 2010. The document, first revealed after a 2012 FOIA request by the Electronic Privacy and Information Center, and recently re-FOIA'd by MuckRock, describes a program that monitors "online forums, blogs, public websites, and messages boards" for particular words, and disseminate information to "federal, state, local, and foreign government and private sector partners." Executed in part by individuals who fabricate social media profiles to watch other users, the program includes "minimization" procedures to protect individuals' privacy, but can store personal data for up to five years. It does not appear to scan Google search terms.
Marthews hopes the paper can provoke policy makers and citizens to think more about the unseen effects of powerful government surveillance programs. "The article helps to establish a common fact for all sides: that surveillance does indeed chill search behavior. We don't expect people to agree on whether that's good or bad, but our article shows that this chill is definitely happening."
He and Tucker are now planning a new study that will refine their data analysis and make state-by-state comparisons to better understand how our search behavior is affected by our knowledge that someone's watching.
"I am sure though there will be some people who just don't believe any econometric exercise if it it goes against what they believe to be true," wrote Professor Tucker, who noted her initial skepticism about a measurable chilling effect. "The empirical analysis shows a clear effect on Google searches of increased awareness of government surveillance, so, while it's not what I was expecting, I believe what the data are telling me."