Image via mkhmarketing on Flickr
It's a pretty well worn observation that when we share on social media sites, we are not sharing just with our connections. We are also sharing with the companies that run these sites and their advertisers, both of which find our freely-given information extraordinarily valuable. However, our tendency to document even the most unremarkable minutiae is also beneficial for another group of people: academic researchers.
In a paper published last week in PLOS ONE, researchers affiliated with the University of Pennsylvania and the University of Cambridge utilized this effusive source of data to better understand the ways in which people use language. They were particularly interested in quantifying our use of language in relationship to the variables of personality, gender, and age.
This study is supposedly the largest of its kind. Around 19 million Facebook statuses were obtained from 136,000 participants. The latter number was eventually cut down to 75,000. “We restrict our analysis to those Facebook users meeting certain criteria,” the researchers noted. Users needed to be native English speakers, have written at least 1000 words in their statues, be under 65 years old, and share their age and gender.
Female and male Facebook trends plotted on a word cloud, via the paper. The center cloud shows frequency of words, with green clouds showing which words are related in a single topic.
In their analysis, the researchers used a new methodology called the open-vocabulary technique that they sought to validate in the course of the paper. Open-vocabulary means that the data was not constrained by a predetermined list of words that researchers were looking for, like much linguistic research is.
Rather, the data, the words and phrases people actually utilize in Facebook statuses, was allowed to speak for itself. An upside to this approach is that it opens up the possibility for all sorts of unexpected conclusions that using a fixed list of words makes unlikely.
To represent their results, the researchers primarily used the ever-popular word cloud. Beyond just being trendy, they point out that word clouds make the most sense given that “the individual words and phrases we depict in it are the actual results we wish to summarize.”
The same concept as the above, this time split by age.
Within the clouds, word size is determined by the degree of its correlation with whatever factor is being examined, whether it is demographic or psychological. Bigger words equate to better correlation. Colors are used to denote the usage frequency of a certain word. Darker colors symbolize higher frequency.
Word clouds didn’t make sense for everything though. “Word clouds allow one to easily view the features most correlated with polar outcomes,” the paper notes. “We use other visualizations to display the variation of correlation of language features with continuous or ordinal dependent variables such as age.”
Frequency of certain phrases used on Facebook plotted over users' ages
The open-vocabulary method proves useful and many of the results are hilariously revealing and a little stereotypical. Introverts apparently like Japanese culture a lot. The use of phrases like “hehe jk” and emoticons sinks pretty hard after age thirteen. College-aged folks talk about getting drunk while those getting closer to thirty prefer to just discuss beer. Females talk more about emotions while men talk more about things.
My main takeaway? We sure use the word "fuck" a lot.