The nEmesis system geolocates Tweets and analyzes them to see who's suffering from a food borne disease, via
If taking time in the middle of a bout of food poisoning to update your Twitter followers “OMG so sick ughh” isn’t peak oversharing, I don’t know what it is. But this growing proclivity for expressing our every feeling at every minute has some useful applications too—for one, helping improve public health.
The latest development in the evolution of social media as an epidemiological tool is the nEmesis, a machine learning system that tracks where people tweet about food poisoning.
Here’s the gist. The tool “listens to” millions of tweets and watches for keywords that could indicate that someone’s Chinese food dinner didn’t go over well—"threw up," "so sick omg," "ughhh cramps," and that kind of thing.
It then geo-locates the tweets (the ones that publish GPS data, at least) and compares that to Google Map locations of nearby restaurants that the person could have been visiting. To account for the fact that someone could tweet from a restaurant at 8 PM and at again at midnight from the bathroom floor, the tool opens up a separate data collection process that listens for new tweets from each person after they dine.
After a four-month-long test in New York City, the system collected 3.8 million tweets from more than 94,000 users, traced 23,000 restaurant visitors, and found 480 reports of likely food poisoning, according to the research paper, which will be presented at the Conference on Human Computation & Crowdsourcing in November.
The next part is the most interesting. Based on that data the tool assigned health grades to the restaurants and found they were a fairly close match to the grades from the Department of Health. That means a mass of ill people with smartphones could give just as accurate a report on food safety as official government inspections—and even better, because it's in real time. That’s pretty significant.
Of course, there are kinks to work out before Twitter is redefined as a public health organization. The main one is how to better find the signal through the noise, and there’s a lot of noise—some 9,000 tweets per second worldwide.
This is where the machine learning element comes in. The nEmesis algorithm mines the fire hose of tweets for the sick keywords, and then human eyes, that can better understand the complexity of language, refine the search. It corrects the computer’s mistakes and guides it to learn. Researchers crowdsourced this part of the analysis with workers from Amazon’s Mechanical Turk.
“By placing the signal in context, a seemingly random collection of online rants suddenly becomes an actionable alert,” wrote Sadilek. “The Twitter reports are not an exact indicator—any individual case could well be due to factors unrelated to the restaurant meal—but in aggregate the numbers are revealing."
Foodborne illness is a big health problem, especially in developing countries. Even in the US, the CDC estimates that 47.8 million people (about one in six) get sick from foodborne disease every year, and over 3,000 die from it.
More to the point, a preventable problem, but of the biggest challenges to preventing food poisoning is getting timely data about it. In New York, for example, many restaurants are only inspected once a year—and budget problems aren’t helping that rate. It’s hardly an efficient warning system. The Twitter tracking model could complement the health inspection system, researchers say. It would be like a first pass for health officials—based on the aggregate of millions of 140-character updates, officials could decide which venues were suspicious and in need of an inspection.
The nEmesis, developed at the University of Rochester by Adam Sadilek, who’s now a data scientist for Google, builds on earlier research into big data from social media as a health tracking tool. Twitter was found to be an effective way to track the spread of the flu, and analyze how lifestyle factors affect mental and physical health.
I have to wonder though, if people knew their personal updates were being used to draw conclusions about public health and safety, would they keep being quite so candid about their lives? Maybe you’d think twice about those mid-vomit tweets.