The Wu at the 2000 BET Harlem Block Party. Image: AP Photo/Stephen Chernin

Rap Tracks Written by Algorithms May End Up Not Being Anything to Fuck With

In a fusion of something cool with something very square, Big Data has entered the realm of Biggie Smalls.

|
Nov 6 2015, 1:00pm

The Wu at the 2000 BET Harlem Block Party. Image: AP Photo/Stephen Chernin

Given that they're the only millionaires who seem to understand how fun it is to be a millionaire, being a successful rapper seems like pretty much the best get-rich-quick-[or-die-trying] scheme imaginable. But artistic success can be elusive, which is why most rising hip-hop stars are forced to spend so much time creating high-profile Twitter beef instead of honing their craft. Fortunately, help is on the way.

In a fusion of something cool with something very square, Big Data has entered the realm of Biggie Smalls. Anthony Abraham and Nikhita Koul are two delightfully earnest hip-hop fans who recently graduated of the Masters of Information and Data Science at UC Berkeley. For their capstone project, they and classmate Joe Morales designed the Rap Analysis Project (note: spells R.A.P.), which applies "machine learning techniques and data science principles to a database of rap lyrics from 1980 to 2015." Among other things, they've produced a model that can predict, based on lyrics, whether a rap track will be a hit.

While their classmates were working on things like predicting baseball pitches and analyzing where electric cars are most likely to be purchased, the R.A.P. team was feeding thousands of rap lyrics into a database, trying to figure out what makes one song this year's "Trap Queen" and another stuck on a CD-R in some dude's pocket.

The results are up on this website, where you can enter rap lyrics and a year, and find out if something is bound to be a hit. There are also charts explaining when certain themes in lyrics spike in popularity, and also which swear words you should use, and how many.

"Basically when we were trying to create a model for our system to find out what variables were important and which ones weren't," Abraham told me in a phone conversation, "we looked at variables like date and specifically what's in the lyrics themselves, from vocabulary sizes and the types of words and the definition of the words—and an algorithmic approach to the topics of the songs. When we had all of them, we could see which variables had the biggest impact on hit prediction."

Screenshot of the duo's theme chart, which is interactive.

"What we found is that themes were a big one, as was profanity, which is interesting because it just goes to show you that different words, profane words, and the amount of them has an impact on the model," he said. "The topics we pulled and the profanity were two of the most impactful variables in our model."

There it is: thematic content and profanity.

I played with it for a while, and non-emcee that I am, I just used other people's songs. Oddly enough, I had trouble finding something that wouldn't be a hit: Sure, De la Soul is good in every era, but going with the most loathed song I could think of, I tried Asher Roth's ode to frat boy antics, "I Love College," which the R.A.P engine predicted would've been a hit in 2001, as it's similar to "'All Night' by Silkk the Shocker, 'Make It Classy' by Talib Kweli, and 'Elvis Killed Kennedy' by Vanilla Ice."

Roth's song also would've been a hit in 1991 because it's similar to "'Weed #2 - Phife Dawg' by De La Soul, 'Ain't a Damn Thing Changed' by Ice-T, and 'Memories' by Cypress Hill." So much for a golden era. Even Jay-Z's limp verse on "Monster" was a would-be hit.

With the aide of the "Swearing" chart, though, I was surprised to find that Wu-Tang's unstoppable "Shame on a Nigga" wouldn't have been a hit in 1987, but that means going almost all the way back to early rap's "disco era," when hip-hop had a lot more introducing people and a lot fewer threats to "fuck your ass up."

Screenshot of the duo's swearing chart, which is interactive.

"You can see the trends [in swearing], and it's kind of interesting, that in the 90s and early 2000s, some words drop out of favor in the 2010s before picking back up again," Abraham said.

"It's a trend in a generation of rap music to have a particular amount of swearing."

If all of this sounds like a formula for repeatable, predictable, formulaic hits, that certainly sounds like a possibility, although really this only deals with language in hip-hop and not all the other factors such as the production and the actual ability to rap the words (Hello, Iggy!).

"We thought [the data] would be most useful for rap music producers, because they want to know what sticks with listeners and what themes work with listeners, so they may produce more music with the same themes," Koul told me on the phone. "That's definitely one use."

Seems accurate enough!

But there are, of course other uses. Koul said that the data could "help a service like Pandora—if people want to listen to songs about religion then the topic modeling can select only songs about that."

Abraham described how it could be useful for "figuring out the generations of hip-hop: when did the style change, or when did the history of rap and hip-hop shift."

But most chilling idea of all came from Koul. "The other thing we were trying to solve was the automatic generation of lyrics," she said. "Pick an artist, learn the songs from this particular artist and then generate a song on a theme. For example, you could have Eminem rapping about climate change, and maybe more people would take it seriously then."

I sort of suspect that the people in power most skeptical of climate change aren't going to be convinced by Eminem, to say nothing of Mr. Mathers himself, but it is amazing to think that hologram Tupac could go on tour and eventually release an album of all new stuff, without Tupac having to lift a pen from his hideout in Cuba, where he's no doubt living as we all wished we could—if we could just find the Goldilocks Zone of swearing.