Wikipedia's Gender Problem Has Finally Been Quantified
Researchers used a computer model to assess the articles for evidence of gender bias.
Image: Giulia Forsythe/Flickr
Wikipedia doesn't have the greatest track record when it comes to gender equality. There's a noted lack of women editors and contributors and not a significant effort to improve that balance. But while many suggest the structural gender imbalance leads to biased content on the site, there hasn't been any quantitative analysis to determine whether that's the case. Until now.
A group of researchers from Germany and Switzerland analyzed Wikipedia's articles across six languages to look for evidence of gender bias and inequalities, publishing their findings on the arXiv preprint server today. Their study found while the site is equal (and even biased towards women) in some ways, when it comes to the language used within articles there are serious signs of gender bias.
Before we dig into their research, a little background on Wikipedia's history of inequality. In 2010, a Wikimedia internal study revealed that only 13 percent of its contributors were women. In response to this imbalance, the then-executive director of the Wikimedia Foundation (the head organization that oversees Wikipedia) Sue Gardner vowed to increase the number of female contributors to 25 percent by 2015.
But just last summer, founder Jimmy Wales told the BBC that simply wasn't going to happen.
"The goal was to do that by 2015 and we've completely failed," he said. "That's a target that we set for ourselves and we're really doubling down our efforts now. We realized we didn't do enough. There's a lot of things that need to happen to get from around 10 percent to 25 percent."
More recently, Wikipedia has faced harsh criticism after the site's arbitration committee (a group of self-selected, long-term editors who make decisions about user conduct and site standards) made a preliminary ruling on the editing of the Gamergate controversy page. The ruling would see a number of feminist editors (along with others) banned from contributing to the page and other gender-related pages. Some critics viewed the decision as an attack on feminist voices and a virtual victory for Gamergate supporters, though the arbitration committee said that's a mischaracterization of their pending actions.
With the noted lack of female contributors and heated debates over pages like Gamergate, many argue that the contributor imbalance has led to a content imbalance. Claudia Wagner, a post-doctoral researcher at the Leibniz Institute for the Social Sciences in Germany, and researchers from a pair of other institutions wanted to find out whether that was actually the case.
Wagner and her colleagues used a computer model to assess the articles for evidence of four different kinds of bias: coverage bias (the number of articles about notable women compared to notable men), structural bias (where links within articles go), lexical bias (difference in language used), and visibility bias (how many articles make it to Wikipedia's front page).
The results were mixed. When it came to coverage bias, Wikipedia was remarkably equal in the number of articles about notable men and women in all six languages when compared to other reference databases. In fact, the researchers found Wikipedia "exhibited an over-representation of women" compared to other databases.
The researchers also discovered the home page articles are equally representative of the genders, stating they did not find "any evidence" of a male bias.
But where they did find evidence of gender bias was in the content of the articles themselves. Wagner and her partners found that articles about women more often linked to articles about men than vice versa. Even more noteworthy was the discrepancy in the kind of language used in articles about women compared to articles about men. For one, articles about women were more likely to use words referencing its subject's gender than articles about men.
"We also noticed that the relationship status and family related issues seem to be more extensively discussed in articles about women since words like 'married,' divorce,' children,' or 'family' are much more frequently used in articles about women," the report reads.
"This gender inequality cannot simply be explained by the imbalance in the coverage or in the pure existence of notable men and women, but shows that men and women are indeed presented differently on Wikipedia," it continues.
The team used likelihood ratios to determine whether this was just coincidence or not and discovered that between 23 and 32 percent of the 150 most used words in articles about women fall under a category of family, relationship, or gender. In articles about men, just 0 to 4 percent of the most common words fall under those categories (more likely are words about the individual's occupation).
While it's far from the final conclusive verdict on Wikipedia's gender biases, the study provides some evidence of an effect of having such an imbalanced contributor base. Since Wales himself wants to "double down" to correct this, maybe further analysis like this will give Wikipedia the tools it needs to level the playing field.