An English professor crunched through 50,000 books to distill the six (sometimes seven) essential plot arcs.
In fiction, as in a cemetery, there's a limited number of plots. We just aren't sure how many. Carlos Gozzi, a 18th-century Italian playwright, thought there were 36 dramatic situations, but ever since then, the number has been going down, cratering with Christopher Booker's popular 2004 The Seven Basic Plot Structures.
But now, with the help of CERN-level mathematics and computers, researchers have evidence that the appropriately named Booker was off by just one, probably.
"I did some distance similarity metric calculations and machine clustering to see if I could identify archetypal plot shapes," Matthew Jockers told me over the phone. "The short answer is, yes I did, and there's six or sometimes seven."
That little ambiguity, Jockers explained, is because the data collecting and sorting technique "involves picking at random from 50,000."
"There's six about 90 percent of the time," Jockers said. "Ten percent of the time, the computer says there's a seventh [plot shape]."
Matthew Jockers is a University of Nebraska English professor working at the forefront of "digital humanities," where databases and computing tools are used analyze more books than any one person could read.
There are sort of two ways of looking at what "plot" is, according to the work of the Russian structuralist Vladamir Propp. There's the way the events unfold in the world of the story, and the way the author reveals events to the reader.
Jockers is focusing on the latter. He's made a model that algorithmically abstracts the structure of plot by looking at how the sentiment changes in a story, resulting in a sort of plot graph. He hasn't yet revealed what those plots are (Man versus Dataset?) but he has released the means for you to try the model yourself. This week, Jockers released the tools via the website GitHub, so you can map plots at home.
When Jockers told me about this project to uncover plot shapes, I said it reminded me of a Kurt Vonnegut piece, wherein the author plots out plots on a graph. Jockers said that it was not only a great way of illustrating his own project, it helped inspire it.
If you're curious about where Vonnegut falls in the "number of plots" debate, in the written version of this lecture, he gives two more bonus plot shapes. Anyway, before Vonnegut starts drawing on the chalkboard he remarks that "there's no reason why the simple shapes of stories shouldn't be fed into a computer" and Jockers took that as his cue.
With the help of friends in the physics department, Jockers figured out how to chart the emotional valence by drawing from a "controlled vocabulary of positive and negative sentiment markers collected by Bing Liu of the University of Illinois at Chicago" and a machine model that Jockers built "to identify and score passages as positive or negative," he wrote on his blog.
If you're so inclined, you can follow Jockers's journey to mapping plots more or less successfully on his blog. In his book Macroanalysis, Jockers looked at Irish literature, and his examples are often popular titles from there—Oscar Wilde's Picture of Dorian Gray, or James Joyce's Portrait of the Artist as a Young Man.
Most books that measure the number of plots seem aimed at writers and would-be writers, but Jockers's work has implications for readers, librarians, and even literature snobs, or anyone who wants to put snobs in their places.
As he was charting plots, Jockers noticed that some genres that are derided for being "formulaic," like romance, aren't just relying on boy-meets-girl.
"Romance showed some proclivity for two of the six plot shapes, but it wasn't an overwhelming case of all the plots falling into one," Jockers said. "It was a much more evenly distributed from these six shapes."
So suck on that, would-be doubters of romance, as well as those who would doubt that computers could make life simpler or easier. They say that every Seinfeld story is unnecessary if everyone just had a cell phone, but thanks to computers, we've managed to shave off yet another plot of the list of possible plots.