Activists have downloaded nearly 100 years worth of TIME magazine issues from the publication's paywalled digital archive and dumped them all online for anyone to grab for free.
“It was possible and seems worthwhile,” Michael Best, a freedom of information activist who obtained the files, told Motherboard in an online chat.
In a statement published along with the files, Best wrote, “They’re a useful research tool with a lot of historical news and cultural information. They should be freely viewable online as they would in a well-stocked library, however most libraries lack this complete a collection of TIME Magazine back-issues.”
Best also wrote in his statement, “Fxck paywalls,” and “I was bored on Saturday.”
When users sign up for a TIME subscription, they may also get access to a back catalogue, called The Vault. Although it doesn't include every issue—TIME says on its website the archive is still in progress—it houses a treasure trove of previous magazine editions in a digital form.
When browsing The Vault, Best noticed that the URL for each page is simply based on the issue's date and page number. Best took that information and figured out the URLs for each issue and page, and then used a tool called DownThemAll to quickly scrape each page.
In all, Best downloaded 3,471 scanned magazine issues (or 340,000 pages) of TIME magazine, stretching from the year 1923 to 2014, totaling up to 97GB of uncompressed data.
Best said the only changes he made to the files was the name of the files themselves. Now, instead of a placeholder filename that every page seemed to hold, the files follow the format of issue date and page number.
“Eventually, I'd like to convert the page scans into a PDF for each issue and run OCR [optical character recognition] on them, but that'll take a LOT of time and computing power,” Best told Motherboard. This would make it easier to search the text of the files.
The documents are being hosted by Thomas White, a UK-based activist otherwise known as The Cthulhu. White has previously uploaded data from LinkedIn, as well as extra-marital affair site Ashley Madison.
There is a chance of legal action against Best and White for obtaining and then publishing these TIME issues.
“That's definitely a possibility,” Best said. “It probably helps that I'm not actually hosting the files, but I am the one who grabbed them and decided to share them. Accepting responsibility matters, though.”
According to TIME's terms and conditions, “Other than as expressly allowed herein or on the Web Site, you may not download, post, display, publish, copy, reproduce, distribute, transmit, modify, perform, broadcast, transfer, create derivative works from, sell or otherwise exploit any content, code, data or materials on or available through the Web Site.” TIME did not respond to a request for comment.
Best regularly uploads masses of content to the Internet Archive, such as documents obtained by Freedom of Information requests, or to mirror files from other people. He didn't turn to his usual haunt this time though.
“It would put them in a very awkward position if they did try to keep it up, but I think they might ultimately be forced to take it down. They've been great about hosting all of the government docs I've posted, but those are already public domain,” Best said. That move could also jeopardize some of the 1 million text files he has added to the archive.
In his statement, Best included a quote from Henry R Luce, the founder of TIME magazine: “Journalism is the art of collecting varying kinds of information (commonly called news) which a few people possess and of transmitting it to a much larger number of people who are supposed to desire to share it.”