But they're still pretty lousy at it.
Image: Flickr/Hans Põldoja
There's a reason you probably know not to upload photographic evidence of a Saturday night spent vomming a stomach full of liquor to Facebook: your boss might see it. But what if your employer mistakes you for a digital doppelgänger who isn't so savvy?
Such a mistake could have negative consequences if, say, my Facebook profile was linked with the Twitter feed of another Jordan Pearson who keeps posting insane conspiracy theory videos; it could mean being passed up for a job. Or, for instance, if someone was maliciously impersonating me on the internet. This isn't unheard of.
Now, a new algorithm devised by a team of researchers based in Germany, France, and the US bumps up the accuracy of linking a profile on one social network with a matching one on another. The algorithm resulted in a match (referred to as recall) 29 percent of the time, with 95 percent accuracy. This might not seem that high, but humans can't do much better. When Mechanical Turk workers were given one Twitter profile and 10 possible Facebook matches, they chose a matching profile just 40 percent of the time, and with 96 percent accuracy.
Previously devised methods have achieved much higher recall rates—sometimes up to 90 percent—by using small sample sizes, according to the researchers in a paper published to the Arxiv preprint server this week. With larger, more representative data sets, these same approaches yielded just 19 percent recall in tests.
Watch more from Motherboard: The Singularity of Ray Kurzweil
The risks posed by a mismatch are real. "People search engines" like Spokeo link people's social media accounts across different sites and provide that data to their business clients. In 2012, Spokeo paid the Federal Trade Commission $80,000 in fines to settle charges that included not ensuring that the information they provided their clients was accurate, and causing "actual harm" to the affected people.
Some of these services, like PeekYou, claim to be able to match profiles across different networks with more than 99.5 percent accuracy, but only when taking into account information usually hidden from view, like emails. When they only use publicly available information—age, name, profile photo, and the like—their accuracy is anybody's guess, the researchers say.
"It is hard to know exactly what all these companies do, because their methods are not public," Renata Teixeria, one of the study's authors, told me via email. "Our results do show that if they are only using public profiles of users, then they should have either very low recall or lots of false matches."
Clearly, this work poses myriad privacy issues
To make this whole process a bit less sketchy—and to try to meet the real challenges of linking profiles across the massive, and noisy, datasets of social media sites—the researchers devised a whole new way to look at the kinds of information considered when matching profiles, called ACID.
This acronym stands for availability (is the information viewable on multiple profiles?), consistency (is the information the same?), non-impersonability (how easy is a name to spoof versus an entire friends list?), and discriminability (how unique is the information compared to other, similar profiles?).
Using these properties, the team exploited what they called the "special matching problem," when a profile can only have one match in another social network instead of many—Facebook and LinkedIn, for example, require that users maintain one profile—to get their result. A "general matching problem," where a user might have many pseudonymous profiles on a given network like Twitter, still proved too difficult.
Clearly, this work poses myriad privacy issues. Is linking databases like this ethical, or even really advisable? The researchers don't touch on this at all, stating that the question lies outside the scope of the paper.
Still, at the very least, maybe in the future, research like this will ensure your boss doesn't mistake you for a beer-swilling monster truck enthusiast who happens to have your name.