The VICE Channels

    Twitter Can Now Predict Crime, and This Raises Serious Questions

    Written by

    Jordan Pearson

    Police departments in New York City may soon be using geo-tagged tweets to predict crime. It sounds like a far-fetched sci-fi scenario a la Minority Report, but when I contacted Dr. Matthew Greber, the University of Virginia researcher behind the technology, he explained that the system is far more mathematical than metaphysical. 

    The system Greber has devised is an amalgam of both old and new techniques. Currently, many police departments target hot spots for criminal activity based on actual occurrences of crime. This approach, called kernel density estimation (KDE), involves pairing a historical crime record with a geographic location and using a probability function to calculate the possibility of future crimes occurring in that area. While KDE is a serviceable approach to anticipating crime, it pales in comparison to the dynamism of Twitter’s real-time data stream, according to Dr. Gerber’s research paper “Predicting Crime Using Twitter and Kernel Density Estimation”.

    Dr. Greber’s approach is similar to KDE, but deals in the ethereal realm of data and language, not paperwork. The system involves mapping the Twitter environment; much like how police currently map the physical environment with KDE. The big difference is that Greber is looking at what people are talking about in real time, as well as what they do after the fact, and seeing how well they match up. The algorithms look for certain language that is likely to indicate the imminent occurrence of a crime in the area, Greber says. “We might observe people talking about going out, getting drunk, going to bars, sporting events, and so on—we know that these sort of events correlate with crime, and that’s what the models are picking up on.”

    Once this data is collected, the GPS tags in tweets allows Greber and his team to pin them to a virtual map and outline hot spots for potential crime. However, everyone who tweets about hitting the club later isn’t necessarily going to commit a crime. Greber tests the accuracy of his approach by comparing Twitter-based KDE predictions with traditional KDE predictions based on police data alone. The big question is, does it work? For Greber, the answer is a firm “sometimes.” “It helps for some, and it hurts for others,” he says.

    According to the study’s results, Twitter-based KDE analysis yielded improvements in predictive accuracy over traditional KDE for stalking, criminal damage, and gambling. Arson, kidnapping, and intimidation, on the other hand, showed a decrease in accuracy from traditional KDE analysis. It’s not clear why these crimes are harder to predict using Twitter, but the study notes that the issue may lie with the kind of language used on Twitter, which is characterized by shorthand and informal language that can be difficult for algorithms to parse.

    This kind of approach to high-tech crime prevention brings up the familiar debate over privacy and the use of users’ date for purposes they didn’t explicitly agree to. The case becomes especially sensitive when data will be used by police to track down criminals. On this point, though he acknowledges post-Snowden societal skepticism regarding data harvesting for state purposes, Greber is indifferent. “People sign up to have their tweets GPS tagged. It’s an opt-in thing, and if you don’t do it, your tweets won’t be collected in this way,” he says. “Twitter is a public service, and I think people are pretty aware of that.”

    Greber insists that there is no danger of individual targeting when it comes to the use of Twitter-based crime prediction, as the system—though it records individual names—does not model individuals, nor does it identify who the actual perpetrators of crime are. Still, the problem may not lie with the targeting of individuals by police, but groups and neighbourhoods. The usefulness of this technology lies, after all, in the more efficient allocation of police resources (patrols, etc.) to specific geographic locations.

    However, Greber refutes this. “You could say it would let police target neighbourhoods, and things like that, but they already do that with the knowledge they currently have. Police know certain neighbourhoods are bad, and they do target those with extra patrols, raids, and things like that.”

    It seems like a bit of a tautology. Twitter-based KDE analysis would not allow for the possibility of specific neighbourhoods or groups being targeted by police, but only because it is already being done. Though the data is not there to substantiate this concern just yet, it does seem as if pre-crime modelling on Twitter in “bad” neighbourhoods—which, let’s be frank, is often code for “largely non-white”—may serve as a virtual incarnation of Stop and Frisk. Though social media-based predictive crime modelling does not explicitly target minority neighbourhoods, the effect may be the same, as Greber notes, but with the sanitized technological alibi of mathematical analysis.

    Pre-crime analysis based on tweets may be coming to precincts in Queens and the Bronx soon, Greber says, as the NYPD has expressed interest in pilot programs in those boroughs. However, he notes, widespread adoption of this technology may be far off, as evidence that it reduces crime rates has yet to be tested. But if the practice is picked up by technologically progressive precincts, it won’t just be your employer who you’ll have to worry about seeing your tweet about knocking back a few drinks after work—it’ll be the cops, too.