FYI.

This story is over 5 years old.

Tech

Algorithms Could Help Predict a Book's Success

By analysing stylistic features, a computer model was able to judge how successful different works of fiction were.
Image via Flickr/Abhi Sharma

Over the past five years we’ve seen a lot of developments in the way we consume literature (yes, a time before Kindles and iBooks is really that recent), and it's got publishing houses very worried. But technology hasn't yet usurped the role of publishers when it comes to picking and choosing the books that make it on to our shelves (or devices) in the first place. There's no robot to sift through the piles of manuscripts and pick out the next Harry Potter.

Now, research from Stony Brook University in New York suggests an algorithm could in fact be capable of estimating a book's chances at popular success.  In a recent study, computer scientists found a computer model could successfully tell which fictional works out of a selection of already-published books were successful, based on some of their stylistic features.

Advertisement

Obviously, the most appealing application of a tool like this would be to predict the commercial success of a book before it goes to print. “Predicting the success of novels is a curious question among publishers, professional book reviewers, aspiring and even expert writers alike,” wrote the authors in their paper, which was published by the Association of Computational Linguistics. That's an art currently based on little more than experience, taste, and hype, and it’s far from a perfect system. “Indeed, even some of the best sellers and award winners can go through several rejections before they are picked up by a publisher,” the researchers said.

Far from this subjective, qualitative approach, their statistical model used cold, hard, quantitative measurements. And the researchers found it could effectively distinguish successful novels from not-so-popular works (that were nevertheless still good enough to get the go-ahead from discerning publishers) with a success rate of 84 percent. It even worked for movie transcripts, for which the accuracy rose to 89 percent.

The model did this by looking at specific stylistic features, rather than overarching themes, plots, characters, or emotional tones (judging these would likely require the input of a well-read human).

So for budding authors looking to pen a bestseller, what were the markers of success? One finding was that books that used a lot of verbs to describe thought processes tended to be more successful than those heavy on action verbs. It also seems that people don’t go for over-emotional verbs like “cried” and “cheered" when it comes to dialogue, and prefer authors to get straight to the point and just use “say.”

Advertisement

“Also, more successful books use discourse connectives and prepositions more frequently, while less successful books rely more on topical words that could almost be cliché, e.g., ‘love,’” the researchers wrote. And it was better to describe things using nouns and adjectives, not verbs and adverbs.

The “success” of a book is of course a debatable quality, and this study essentially equated it with popularity from limited sources. The books studied were taken from a selection on Project Gutenberg, and the download counts of each work were used as the main indicator of success. For a few books, the researchers also took into account Amazon sales or prestigious awards like Pulitzer and Nobel prizes. For movies, the film’s average score on imdb was used as a measure of success. While none of these markers necessarily represent the true literary value of a work—which is still a very human judgement—they do suggest to some degree which are most crowd-pleasing. And that's no doubt of interest to publishers looking to hit the bestseller list.

The researchers also used their model to reconsider the common belief that readability (i.e. being easy to read) is a desirable quality in a book. What they found defied the conventional wisdom. “We made an unexpected observation on the connection between readability and the literary success—that they correlate into the opposite directions,” they wrote.

How useful their findings could be in the real-world application of predicting future books' success is up for debate, and some publishers and authors have taken issue with the method, suggesting for instance that the topic of a work is in fact more important to its success than stylistic details.

And of course, in the real world, factors like an author’s popularity also play a large role in how quickly books fly off the shelves. Publishers probably don't need to worry about robots taking their remaining jobs just yet.