Why the PDF Is Secretly the World's Most Important File Format

The story of the PDF, the file format that’s become one of the internet’s defining information tools. It’ll be with us after we’re long gone.

|
Mar 5 2018, 4:30pm

Image: myfra/Pixabay 

A version of this post originally appeared on Tedium, a twice-weekly newsletter that hunts for the end of the long tail.

The Portable Document Format, or PDF, is everywhere. But it's still a format that causes headaches for the average person.

Just take former Trump campaign manager Paul Manafort, who may not be the average person, but who runs into issues with the PDF just like the best of us.

Justice Department Special Counsel Robert Mueller’s most recent indictment of Manafort noted how the lobbyist and his colleague, Richard Gates, collaborated on modifying a PDF document by converting the document into Word format, changing an amount in the document, then changing it back to a PDF.

This created something called a paper trail, bolstering Mueller’s case against Manafort.

It's not often, of course, that the PDF gets this level of notice. The PDFs origin story is a bit more boring than that of the MP3, which was built around the contours of Suzanne Vega’s unaccompanied voice on “Tom’s Diner,” and the ZIP file, which came to life in a brutal legal battle that was egged on by the whims of BBS users.

But the PDF still has a story, and that story is that of a format that promises to be even more valuable in the decades to come. Here's why.

“What industries badly need is a universal way to communicate documents across a wide variety of machine configurations, operating systems and communication networks. These documents should be viewable on any display and should be printable on any modern printers. If this problem can be solved, then the fundamental way people work will change.”

— John E. Warnock, the cofounder of Adobe, discussing his thought process around the need for a simple document format in an essay revealing the existence of The Camelot Project (which is, of course, in PDF format). Warnock, who was also responsible for helping to develop Adobe’s bedrock PostScript document scripting language, noted that PostScript and its sister language Display PostScript was too heavy for most computers being made at the time he wrote his essay, around 1990. “The Display PostScript and PostScript solutions are the correct long-term solution as the power of machines increases over time, but this solution offers little help for the vast majority of today’s users with today’s machines,” he explained.

Ken Teegardin/Flickr

Why the “killer app” for the PDF may have been, of all things, tax forms

Around the time that Warnock and his colleagues at Adobe were trying to figure out the difficult problems of creating a simple file format that could be used to read documents on regular people’s computers, the Internal Revenue Service was dealing with an annual headache that it faced in working with the US Postal Service.

Basically, every year just before tax season, the IRS would mail out tax forms to hundreds of millions of people around the United States. This annual mailing was, during non-Census years, the largest annual mailing that the postal service had to deal with—around 110 million individual mailings annually, according a 1991 New York Times article. And the IRS, dealing with a complicated tax code, had to manage and deal with a wide variety of exceptions and differing forms, for both businesses and individual taxpayers.

This was not only incredibly wasteful—never a good thing when you’re the Internal Revenue Service—but it represented something of a logistical nightmare, because it also hinted at the ways that paper gummed up the works throughout the federal government.

This was a situation where the PDF would have been of immense value. Certainly, software solutions had existed on the market at that time—among others, TurboTax on the PC and MacInTax on the Mac—but the average American user wasn’t necessarily at a point where they would trust their computer to do their taxes. But they might be cool with printing the forms.

Fortunately, Adobe was ready. At the end of 1992, the company first showed off its PDF technology, given the brand name Acrobat, at the trade show COMDEX. The trade press of the time wrote of Acrobat with much excitement, as it represented the ability to take a document as it would show up on a printed page—if it even needed to be printed at all. It was even named “Best of the Show” that year.

But Warnock admitted that, early on, his approach to solving the problem of aggressive paper didn’t catch on right away.

“When Acrobat was announced, the world didn’t get it. They didn’t understand how important sending documents around electronically was going to be,” Warnock said in a 2010 interview with Knowledge@Wharton.

But the fact of the matter was, Adobe had the perfect use case already out there in the form of the IRS, not to mention the rest of corporate America.

An Adobe Acrobat 1.0 promotional video.

Adobe had a potential solution to cut down on the mountains of paper being produced by offices the world over. And as Adobe had the de facto market standard already with PostScript, it also had the inside lane. You can see where this is going.

According to NetworkWorld, the IRS was already distributing tax forms in PDF format in early 1994, a move that helped build broad momentum behind the format.

But one element was missing, and that element was the web, which made the concept of accessing tax documents relatively easy. And by the 1996 tax season, that element was ready to go, as the Internal Revenue Service booted up its web servers—complete with more than 600 documents ready for download in PDF format, according to a 1996 column from tech guru Kim Komando.

A case study on Adobe’s website notes that the IRS went all-in on the PDF around this time, giving copies of its software to more than 100,000 employees at as of 2001, and saving millions of dollars in printing costs in the process.

Beyond saving all the mailing of most of those forms, it helped the company save lots of headaches by making materials easier to find in audits. Instead of having to put stuff in obscure file cabinets, it could be accessed electronically by tax examiners and auditors.

“In terms of employee satisfaction alone, Acrobat pays for itself,” an IRS official told Adobe. “Add to that the benefits of easier document administration and less paper storage, and it’s clear that Acrobat and Adobe PDF provide real returns to the agency and the people we serve.”

Clearly there’s some fluff in that quote, but the IRS was very much a microcosm of the business world at large. The PDF, in a very short amount of time, became one of the most important ways business users shared documents. (Academia, of course, quickly bought in as well.)

The PDF simplified the hard work of going to Kinko’s, because the file format was able to easily embed assets like fonts and images, streamlining one of the hardest parts of getting a file printed. (Of course, you generally couldn’t make changes in PDF form.) Eventually, the PDF became searchable and even editable.

And most importantly, in the case of the IRS, “fillable.” The IRS quickly created versions of its tax forms that allowed end users to put in their own numbers, and, eventually, even their own signatures.

While none of this was as lightweight as, say, a text file nor as flexible as HTML, it sure beat PostScript for the average person.

And the PDF became the long-term solution.

“PDF has become a de facto global standard for more secure and dependable information exchange since Adobe published the complete PDF specification in 1993. Both government and private industry have come to rely on PDF for the volumes of electronic records that need to be more securely and reliably shared, managed, and in some cases preserved for generations.”

— A portion of the foreword of the ISO 32000-1 standard, the first standardized version of the full PDF specification in 2008. While Adobe first created the PDF in 1993, it left the format open so that other companies could use it, allowing it to become a de facto standard. (Adobe largely charged for the creative tools.) But in 2007, Adobe worked with the International Standards Organization to create an open standard for the technology. The move highlighted just how prevalent the standard had become.

Perhaps the most important role of the PDF in the modern day is archival

Let’s just admit something straight out: Standardization is boring.

It’s a dull topic, but it’s something that is incredibly important in the world of archival. The reason for this is obvious, of course: If you randomly change the way you produce and store microfilm, for example, that microfilm becomes a pain to reuse.

But this also cuts both ways. There are things that you don’t necessarily want out of a standard. Let’s say you don’t care about interactivity because you’re trying to digitize documents that date back hundreds of years.

Still, there may be niceties you want, like the ability to make the text searchable. And perhaps you want to ensure maximum compatibility, working with all variants of a tool.

All these reasons, and more, are why the PDF/A format was created in 2005. Unlike a standard PDF, which is designed to take advantage of the fact it’s made for a computer, PDF/A was designed to be maximally reproducible, to the point where it could replace a printed document if the original paper was lost.

“Everything that is required to render the document the exact same way, every time, is contained in the PDF/A file: fonts, colour profiles, images etc. PDF/A is also an ISO standard, guaranteeing that future software generations will know how to open and render PDF/A files,” explains Shawna McAlearney, a marketing specialist for Appligent Document Solutions, in an FAQ on the PDF Association website.

This is good for organizations such as the Internet Archive and the Library of Congress, who are saving information for the long haul and need it to be readable 30 years from now. But it does lead to some controversy at times in the archival space, such as when the format was extended in 2012 to allow for the embedding of files like spreadsheets and HTML documents.

But some critics of the quick uptake of the PDF/A are out there. In a paper on the subject, Marco Klindt of the Zuse Institute Berlin lays out a variety of issues with the format from an archival perspective, including (among other things) that it can be cumbersome to use.

(Notably, usability expert Jakob Nielsen has also strongly come out against the use of PDFs for the same reason, stating on his consultancy’s website: “PDF is good for printing, but that's it. Don't use it for online presentation.”)

Klindt, who also lays out legal and integrity issues with the format, suggests that the desire for a suitable preservation format limited discussion of whether or not the format really made sense in the long run.

“Familiarity of PDF led to fast and widespread adoption of PDF/A as a solution in the field of digital archiving,” he writes. “This fact may have muted prophetic voices demanding the quest for and development of more suitable content containers for research work (text and data) with reuse in mind.”

Even if this is the case—certainly I’ve loaded my share of 300-megabyte PDFs over the years, and there are plenty of documents online that have no business being PDFs—it’s certainly worth admiring how much the format has done to digitize and protect our collective knowledge.

In 50 years, these PDFs, even with their weaknesses, will help us document history with little of the ephemeral nature of the web. And unlike in paper form, those PDFs won’t suffer from frayed pages.

The history of our generation will probably be in PDF form.

“[Adobe’s] board wanted to kill it. I said, ‘There’s just no way. This is solving an important problem, and we are going to hang in there until it works.’”

— Warnock, speaking to Knowledge@Wharton about Acrobat’s early years. These days we take for granted the fact that PDFs are common basically everywhere online, but there was a point when the PDF format was in such dire shape that Adobe had to stop charging for Acrobat Reader, a move Warnock described as a “very risky choice.” (They charge lots of money for Acrobat instead.) But the decision to stick with the client and make it free ultimately proved the key to Adobe’s success as a company. Even though people might be quicker to think of Photoshop when they think of Adobe, a 2013 profile of the Adobe cofounder by his alma mater, the University of Utah, ultimately put the company’s success at the feet of the document format Warnock created. “The PDF put Adobe on the map,” author Jason Matthew Smith wrote.

Going back to Manafort, is there anything he could have done differently in his case, to prevent it from becoming an outright embarrassment?

According to the PDF Association, the answer is most certainly yes.

Beyond the fact that the conversion from Word to back creates subtle changes in format that can be tracked, software like Adobe Acrobat can be used to directly edit the text in a file!

Here’s the association’s take:

Manafort could have readily altered the PDF himself. Had he done so, he would have avoided a key part of the paper trail that may land him in federal prison. He probably even had a PDF editor already on his computer.

In the money-laundering business, after all, it seems likely that one would frequently need to assemble pages from multiple PDF files; you need a PDF editor for that. For most of his money-laundering career, Manafort was almost certainly just one or two clicks away from the editor mode.

The result is that PDF editing is likely to play a significant role in a major political scandal.

The story of the invention of the PDF may not have a legal battle at the center of it or a hook like a Suzanne Vega song to push its story forward, but it does have this scandal. And love it or hate it, Manafort's awkward use of a tool used by basically everyone really highlights how prevalent the PDF really is.