A 13-Year-Old Wants to Map All the Bullies in the US
Big data fights big jerks.
Image via flickr/pressmanwill
Imagine you're a parent deciding where to raise your kids. You're probably going to scope out how safe a town is, if the school district is good, if the neighbors seem friendly. A 13-year-old kid wants to add another metric to list: whether the area has a bullying problem. He's creating a real-time map of where bullying is going on across the US by tracking social media posts on the topic.
The student, Viraj Puri, is one impressive teenager. He runs the blog Bullyvention, which is a popular resource on Capitol Hill, has worked directly with members of Congress (or "lawmakerz" to use the blog's parlance), and now is teaming up with data scientists to develop a live index of bullying in communities around the country.
The map is still in an early beta phase, but the idea is to track certain keywords on Twitter, Facebook, YouTube, and other social media sites, analyze whether they're instances of someone bullying or being bullied, and create a live index with the data that could be used to spot geographic pockets in the US where bullying is prevalent. The screenshot below is from this morning, based on between 500 and 7,000 Tweets per hour. That's not a huge sample size, which is why at this point the data can look a bit misleading—an extra flurry of Twitter activity could be enough to cause, say, that huge red spot over Florida, which probably doesn't mean Key West is a hotbed for bullying.
Map of real-time mentions of bullying across social media, via Vertabox
Number of tweets per hour on January 21, via Vertabox
"Our goal is not only to 'predict' the likelihood of the next bullying occurrence but see where it’s happening geographically, which begs the question ‘Why it is more of an epidemic in some areas of the country over others?’" Puri said in a release about the project. "This will put more focus on quality of life and get lawmakers to address this very important issue."
There are already tools online that create live visualizations by querying publicly available geolocated social media posts. But Puri wants to get his hands on more data. He's contacted Facebook and Twitter to ask for permission to access their company data to include in the index, and wants information from Google and Snapchat too.
Of course, analyzing billions of mentions of the word "bully" and its variations isn't easy—right now if you search for the word online you're going to pull up a bunch of articles and messages about Gov. Chris Christie. So Puri's working with Dr. Xiaojin Zhu, a researcher at the University of Wisconsin who's using big data to study bullying.
Zhu is developing algorithms that scoop up thousands of tweets about bullying and use machine learning to sift through the posts. So far his research has found that only about 4 percent of mentions are actual cyberbullying attacks. Others are responses from a victim, messages speaking out against bullying, or news articles on the subject.
There are definitely some hurdles to clear. If Twitter and Facebook don't agree to hand over more useable data, it'll be hard to get a big enough sample size to show an accurate picture of where bullying is going on, and even then the majority of posts are unrelated and have to be filtered out.
But it's an ambitious idea, and would be interesting to see if that level of awareness sparks action. Puri hopes that an index showing where bullying is rampant and exposing the kind of abuse that's often hidden behind computer screens would get parents fired up and put pressure on policymakers to crack down on the problem.