Can We Mine Life-Saving Health Data Without Sacrificing Privacy?

Larry Page said mining health care data could "probably save 100,000 lives." But there's a reason we're protective of it.

Jun 27 2014, 5:15pm
Healthcare data could probably save 100,000 lives next year—if we weren’t so damn fussy about our privacy. That’s the essential gist of a comment Google’s Larry Page made in an interview with Farhad Manjoo at the New York Times.

Page made the comments in response to public resistance to data-mining technologies. “For me, I’m so excited about the possibilities to improve things for people, my worry would be the opposite,” he said. “We get so worried about these things that we don’t get the benefits.” He told Manjoo, “Right now we don’t data-mine health care data. If we did we’d probably save 100,000 lives next year.”

He’s pushed a similar line before, such as in a TED appearance in March this year. But to present the issue in this black-and-white manner sets up a false dichotomy of health (survival, even, to use his hyperbole) and privacy. We shouldn’t have to choose. And we shouldn’t feel that because of the high stakes—“saving lives”—concerns over issues such as privacy invasion are invalid.

Page is probably right that data mining could save some lives. After Page’s TED talk a few months ago, TechCrunch’s Gregory Ferenstein pointed to an FDA researcher who claimed that tens of thousands of deaths from the side effects of arthritis drug Vioxx could have prevented if hundreds of millions of health records had been available (though whether that would actually have happened is debatable, as we'll get to later).

Similar foreseen benefits are behind the UK government’s “” plan, an opt-out NHS initiative that would see patient records made available to researchers. The idea is that such broad population data could allow researchers to better pick up on trends that could give insight into the causes and risk factors for conditions, side effects of treatments, and so on—and therefore save lives. 

Where Page got the 100,000 figure from I’m not sure, and perhaps it was just an arbitrary figure to make his point. But the general sentiment—“if we could use more of your data, we could save lives”—smacks of Silicon Valley’s phoney do-gooder hubris. It skirts around the facts that a) the people who want to do the data mining (like, oh, I don’t know—Google, perhaps?) have a vested interest in being allowed to do so, and b) big data isn’t automatically useful to healthcare anyway. Just look at Google Flu Trends, which proved to be completely inaccurate.

But most importantly, it fails to recognise that there are good reasons why they’re currently not allowed to. was postponed earlier this year because of these reasons: primarily, privacy concerns. Health care data is of particular concern to privacy advocates because it’s naturally sensitive, and it’s basically impossible to anonymise. When Page and others talk about analysing healthcare data “anonymously,” they gloss over the fact that, even without an individuals’ key personal details, health data can be quite revealing.

Take me, for instance. I happen to have a medical condition—nothing exciting, but it’s relatively unusual, with an incidence of one case in over 200,000 population per year. If you had my date of birth and my gender, you’d narrow my identity down a lot further because it also usually presents in people several decades older than me, and it’s more prevalent in men than women. Start adding in other details my doctor would have, even without revealing obviously identifying information like my name, and the net starts to close in.

Then there’s the fact that other places outside of health care authorities—insurers, for instance—may hold other data, making re-identification even more possible. If one dataset has your date of birth but not your address, for instance, but another has the inverse, the two could be compared to fill in the gaps. 

That’s why health data mining such a privacy concern. And it’s not just privacy for the sake of privacy, either (though that is itself a perfectly reasonable expectation). There’s also the fear that, should health data be made available, it could be used to discriminate. The classic example is the concern that insurance companies could use information about an individual’s health to raise their premium.

Now, I agree with Page that this fear shouldn’t blindly stand in the way of progress; but it certainly deserves thorough attention before health data-mining gets a free pass. You can’t put the genie back in the bottle once it’s out in the digital ether, perhaps sitting on some Google-supported cloud.

We need to consider exactly what data should be mined, who should have access to it and, generally, what safeguards are needed. And, as ultimately conceded, we need to make sure people are well-informed about both the benefits and risks, and give them a proper chance to opt in or out.

“It is only big data, not magic.”

The point is, we shouldn’t feel that in order to save lives, we need to sacrifice privacy. There’s a middle ground to be found that strikes the right balance—that lets Google and whoever else play superman, without completely giving up on the other values our societies deem important. 

How much we want to compromise depends largely on the extent of the risks and the pay-offs, and the latter is perhaps overhyped by Page and his pals. Releasing millions of health records won’t do anything on its own to save lives; the potential lies in what we can do with it. Over at the Conversation, cybersecurity lecturer Eerke Boiten argues that we’re just not there yet, and notes that Page’s statement echoes the NHS’s claims that could prevent child deaths. “It is only big data, not magic,” writes Boiten. “Preventing child deaths appears to be brought in as emotional blackmail, expected to trump the valid concerns over the NHS' big data plans.”

It’s a bit unclear in Page’s wording when he said that “we’d probably save 100,000 lives” whether he meant “we” as in his company specifically, or “we” as in the human race in general. Certainly it’s clear that Google likely has an interest in gathering this kind of data. Collecting user data is pretty much their MO, after all, and with the mysterious Google Calico project focusing on “health and well-being, in particular the challenge of aging and associated diseases,” it wouldn’t be surprising if they had health records in their sights.

Then, of course, there's Google Fit, announced at this week's I/O. Details about the new initiative are thin on the ground, but the idea seems to be to let apps and wearable tech share your fitness data with each other. That suggests Google could get into the business of collecting or at least aggregating health data, not just mining it.

Moving forward, there’s no doubt big data has a role to play in the future of healthcare. But exactly what that role is requires some careful discussion—not just sensationalist claims a data-hungry tech giant.