Facebook’s “shadow profiles” were just the tip of the iceberg.
Image: Quinn Dombrowski/Flickr
We generally think of data privacy as individuals: what can I do to protect my data? Less common is the suggestion that online privacy isn't an individual concern at all. New research suggests that your friends' online connections can give away private details like your sexual orientation, even if you're not on a social networking site at all.
Social networks like Facebook and Twitter thrive on connection and interaction. Relationships have to be built and maintained through conversation, retweets, tags and photos. As the network grows, so does its predictive power, in aggregate.
The intertwining mesh of online relationships and the many bits of information that circulate within it can reveal a lot about its various nodes—that is, individual users. Last year, Facebook received flak for creating "shadow profiles" of users, which contained information like phone numbers, gleaned from the interactions of friends on the site.
According to a group of researchers at ETH Zurich, Facebook's implementation of shadow profiles is just the tip of the iceberg when it comes to what social networks can indirectly reveal about us.
"On the aggregated level, this leads to an imbalance between the knowledge that a single user has about the [social network] provider and the knowledge that the provider has, or is able to deduce, about individual users and even about persons that are not users," the researchers write.
In a new study available on the arXiv preprint server, which will be presented at the 2014 Conference on Online Social Networks next week, the researchers were able to determine, mathematically, how accurately their own shadow profiles could reveal the sexual orientation of individual users on Friendster, referred to as "partial shadow profiles," and even people who weren't on the site, which the researchers called "full shadow profiles."
Before Friendster's life as a social network ended (it's now a gaming site), it was crawled by the Internet Archive, providing researchers with a snapshot of a social network in full swing.
For the purposes of the study, the researchers only considered the first 20 million US-based users of the site, roughly 3 million of which had profiles that listed their contacts, age, and gender. Altogether, their dataset contained roughly 11 million indirect friend connections between the users.
A map of the Friendster network as it stood. The red lines indicate connections between users of the same sexual orientation.
Previous studies have shown that gay men can be detected on a social network by the amount of publicly gay friends they have. Building off of these findings, the researchers were able to model who was gay, straight, or bisexual on Friendster by training a machine learning algorithm to model the connections between users who made their gender and romantic interests public.
After feeding the algorithm user profiles that did not make their orientation (gender plus interest) public, and then crunching the numbers, the researchers concluded that most profiles had a higher "privacy leak factor" the larger their immediate network was. Gay men were at the highest risk of having their orientation indirectly revealed, according to the results.
"This suggests that homosexual male users that do not disclose their sexual orientation are at a larger risk of privacy leakage if the tendency of other users to share their sexual orientation becomes stronger," the authors wrote.
The more friends that joined the network, the more vulnerable the offline users' information was.
Though this finding is all both impressive and unsettling, it's not all that different from what social networks like Facebook already do with shadow profiles, albeit with more sensitive information. Next, however, the researchers tested for their ability to predict the sexual orientation of users not on the network using a similar approach. In other words, they built shadow profiles for people who really have no say or complicity in the matter at all.
"The idea is, when a user shares its contact list with the OSN [Online Social Network], the provider can find out which email addresses do not have an associated account and can generate a full shadow profile for these non-users," the researchers wrote. "If those non-users appear in many contact lists of OSN users, data mining techniques can be used to infer the home location, age, gender, etc., of the non-users."
Using this basic technique, the researchers modelled the connections between people on the network and off, as long as the users provided their contact lists to Friendster, and calculated the statistical probability that the offline group's sexual preferences could be accurately inferred. The results were largely the same.
"In the full shadow profiles problem, non-users are subject of losing privacy as other individuals join the OSN, potentially revealing their contacts," the authors wrote. The more friends that joined the network, the more vulnerable the offline users' information was.
The main takeaway from the study is that better connected users face a higher risk of having the private details of their lives exposed indirectly. It follows, then, that more isolated users have a better chance of keeping their lives hidden from unwelcome, prying eyes.
Of course, social networks are "social" for a reason. It doesn't really do anybody any good to join Facebook only to remain a digital hermit. And, finally, here is the rub: as networks grow, privacy erodes. The question of how to deal with this must come next.