Google, a Search Company, Has Made Its Internet Archive Impossible to Search
The search giant broke its Usenet archive. Again. And no one knows if or when it will get fixed.
Changes to Google Groups have been gradual—with each update stripping more functionality than the last.Photo via Marino González/Flickr
For well over a decade, Google has maintained one of the internet's most important historical archives—a collection of over 800 million messages from discussion groups dating back to 1981. And much to the chagrin of online researchers, the company has been doing a really bad job.
In December, users discovered they could no longer search for posts across the archive by date. Google, a search engine, had made its archive impossible to search.
"The Usenet archive in Google Groups is an invaluable resource for historians when it comes to researching events that occurred in the '80s and '90s," wrote Kate Willaert in a post to Google support describing researchers' latest woes. Now? Not so much.
Usenet was where the majority of online discussions took place in the early 1980s and 1990s—a network of topics, or newsgroups, where users could post and read messages on everything from politics to music. A service called DejaNews launched in 1995 in attempt to archive and preserve this wealth of early internet content, and Google acquired DejaNews, along with other historical archives, in 2001.
But the problem, according to Willaert and other researchers, is the way Google Groups now handles searches for posts before or after certain dates. "The "before:YYYY/MM/DD" and "after:YYYY/MM/DD" terms have stopped working, and it also appears to no longer be possible to search by date," Willaert wrote. It is, apparently, a recent change.
"I don't understand the point of having 30 years of Usenet archived if you can't search it with any accuracy," wrote Neil Cicierega—yes, that Neil Cicierega—in response to Willaert's post.
According to Daniel Rehn, a Los Angeles-based artist, researcher, and media archaeologist who first brought the problem to my attention, changes to Google Groups have been gradual—with each update stripping more functionality than the last.
"For several years (and through UI overhauls), Google Groups maintained an Advanced Search page which provided intensely robust searching dating back to 1981," wrote Rehn in an email. But at some point in 2013, the Advanced Search page was removed, and most of its capabilities moved to arcane text-based search operators. Then, at the end of last year, "the date-based search operators (before: and after:) stopped working entirely" said Rehn—only for global searches across the whole Usenet archive, but still, a fundamental feature for those who are researching archival topics and don't know where to start.
In other words, if you want to know what people in the mid 1980s thought about William Gibson's Neuromancer—in which the author coined "cyberspace"—or how the internet responded to the rising popularity of grunge, you can't just search against Google's entire Usenet archive for posts within a specific month or year. "Advanced searches within specific groups appear to be working, but that's hardly useful for any form of research—be it casual or academic," Rehn said.
Was the change done on purpose, or was it a mistake? No one knows. Motherboard emailed Google twice, and did not hear back.
Writer and technologist Andy Baio, in an article critical of Google posted late last month, argued that perhaps we shouldn't be trusting corporations such as Google to preserve our past in the first place. "In the last five years, starting around 2010, the shifting priorities of Google's management left [its] archival projects in limbo, or abandoned entirely," Baio wrote. "After a series of redesigns, Google Groups is effectively dead for research purposes. The archives, while still online, have no means of searching by date."
Baio pointed to the not-for-profit Internet Archive as an example of archival preservation done right. But while the Internet Archive has a Usenet archive of its own, the two collections in its possession are much smaller—less of a replacement than an alternative to the data Google owns.
What we do know is that this isn't the first time something like this has happened. In 2009, Wired's Kevin Poulson wrote an article eerily similar to this one, after Google broke the search functionality of its Usenet archive in a similar way then, too. The problem lasted a year until Wired's article apparently spurred the company to act.
Are you listening, Google? Maybe this article will encourage you to act again.