What Computers Can and Won't Tell Us About Who Wrote Shakespeare's Plays

Even though it almost definitely could.

Ben Richmond

Ben Richmond

​A Chicago Tribune cartoon about the Shakespeare authorship debate from 1916. Image: Wikimedia Commons

We made it about an hour into the conversation before I finally asked Matthew Jockers if he could figure out who wrote William Shakespeare's plays. He just laughed.

It wasn't something out of the blue, and it's not something I just go around asking random people—I know that some people think that Sir Francis Bacon, or Ben Jonson, or Queen Elizabeth used the name "William Shakespeare" as a pseudonym, because the Bard's plays deal with court life and Italy and whatnot, which seems far beyond what a son of glove maker from Stratford who doesn't seem to have traveled much or known many people would know. But I don't have much of a personal stake in the matter; I just wanted to see what Jockers could tell me.

Jockers is a res​earcher who studies "computational text analysis" and is a faculty fellow in the University of Nebraska's Center for Digital Research in the Humanities, a new program, in the new field of "digital humanities," a catch-all term for where databases and new computational tools get applied to old, old forms. Among other things, it's where big data and big books collide.

I had called Jockers up to talk about his book Macroanalysis: ​Digital Methods & Literary History, wherein he outlines what you can learn about literature when you're freed from the task of personally and carefully reading each and every book yourself.

To be clear, computational text analysis isn't the end of reading or even in competition with more traditional types of scholarship. There's only so many books that any one person can read, though, and if people want to talk about "literature" generally—or even slightly more specifically, on topics like "Irish literature" or "go​thic novels" or whatever—rather than just a particular text, they have been stuck with what amounts to only anecdotal evidence, based on the small number of books out of the whole that they've had time to read.

"What I'm trying to do is provide a much larger context in which to understand those individual books," Jockers said. "And you can't do that by reading the books because there's too many. As a proxy for that, there are things you can calculate and quantify. No, it's not the same thing as reading 3,500 books, but it's the only alternative."

So Jockers looks at massive amounts of literature for patterns that emerge in both metadata about the books—biographical facts about the authors, the time and place the book was written in—as well as the text of the books themselves—stylistic tics, setting, even looking at the sentiment of words and mapping out plot.

Hiding in the text, perhaps imperceptibly to the reader, is the author. That's how we got onto the Shakespeare thing.

Procession of Shakespeare characters. Image: ​Wikimedia Commons

"My earliest text analysis work was all in authorship attribution," Jockers told me. "People have fingerprints that can be used to identify them and people have habits of word usage patterns that can be used to identify them."

"What I got interested in in Macroanalysis was sort of a large-scale level attributions—do male and females have different patterns of usage in novels? Do British, Irish, and American authors have different patterns?" he said. "One of the tests I did: Just using 'the' alone, can you identify British, Irish or American authors? And I found that just using that word, you could get 62 percent accuracy. It's because Americans use 'the' at about a full percentage rate more frequently than British: 6 versus 5 percent."

Apparently, the American habit of using definite articles in sentences like "I had to go to the hospital" really adds up when compared to the British proclivity towards just "going to hospital."

Anyway, Jockers had dabbled in controversial authorship identification before, he told me, looking at who wrote the Book of Mormon, specifically to address whether it was plagiarized from an unpublished manuscript by a man named Solomon Spalding by Joseph Smith and early Latter Day Saints historian Sidney Rigdon.

"I didn't have a particular person in the race. I was invited to work on this question by another scholar, and, in my opinion, the results of our study didn't provide a definitive answer to who wrote the Book of Mormon, but it provided evidence supporting one particular theory," he said.

Modest as he makes the findings sound, coming out in favor of the Spalding/Rigdon authorship predictably, set off a lot of a round of scholarly debate and internet anger, Wikipedia revisions, the works. The Book of Mormon is a religious text after all.

The Shakespeare thing is an absolute lion's den

"Anyway, after I had published that paper, someone called or wrote me to ask if I'd chime in on one of the Shakespeare things," Jockers said, "and I just said no way. The Shakespeare thing is an absolute lion's den. There are people who care deeply about that topic, I've reviewed papers on the topic and it's just not one I want to get into."

That's true. People care so much that even when the Supreme Court Justices waded into the debate, they resorted to name-calling. Former Justice John Paul Stevens said he thought that the Elizabethan courtier Edward de Vere, the seventeenth Earl of Oxford, is the real Shakespeare, but critics of this theory are apparently quick to point out that it was founded by a man named "Looney," which Stevens ac​knowledges hurts the cause. To avoid slurs via Looney's name, the preferred term for someone who believes this theory is "Oxfordian."

While the court at the time didn't break on predictable lines, with the conservative Antonin Scalia joining up with the bow-tie-wearing liberal Stevens, but it split along a predictable slobs versus snobs continuum.

Scalia told the Wall Street​ Journal that his wife "thinks we Oxfordians are motivated by the fact that we can't believe that a commoner could have done something like this, you know, it's an aristocratic tendency," while Scalia's theory is that "it is probably more likely that the pro-Shakespearean people are affected by a democratic bias than the Oxfordians are affected by an aristocratic bias."

In the same ​article Stevens acknowledged that "a lot of people like to think it's Shakespeare because...they like to think that a commoner can be such a brilliant writer...Even though there is no Santa Claus, it's still a wonderful myth."

So, yeah—even among people whose lives and livelihood have nothing to do with Shakespeare, but who are supposed to be really good at sorting out arguments, you're stuck between being an aristocratic snob or believing in a wonderful, Santa-Claus-style myth. You can imagine how actual Shakespeare scholars talk about this.

"The Shakespeare controversy...was one of the origins of the willful ignorance and insidious false balance that is now rotting away our capacity to have meaningful discussions," one-time Shakespeare professor Stephen Marche wrote in the New York Tim​es.

For all the potential that computational text analysis brings, Jockers knows hell hath no fury like an English professor scorned, and its enough to send the Shakespeare Authorship question into the realm of questions that are impossible to discuss; along with pitbull ownership, and breastfeeding, but not, surprisingly, the veracity of a religion's foundational book.

"It really is a very very contested and hotly debated area. People involved in this get mean," Jockers said, "and I'm saying this as someone who wrote about a religious text."