FYI.

This story is over 5 years old.

Tech

All the Data Tech Companies Collect Still Can't Rival the Old-School Census

Statisticians need better quality data than your Facebook 'likes.'
"There is nothing that can replace a census," said criminology professor Ron Melchers. ​Photo via Eric Fischer/Flickr​

Every day, we hand millions of pieces of information over to Silicon Valley behemoths that know what we buy, who our friends are, and what weird medical conditions we look up late at night. And yet, when it comes to collecting serious population data, we still rely on a tool that's thousands of years old: the census.

The Egyptians, Babylonians, Chinese, and Romans all conducted regular censuses to figure out how many people lived in their cities, what property they owned, and what kind of livestock they raised. Given how much today's companies know about us, a census seems an oddly archaic way to collect those kinds of statistics in 2015.

Advertisement

But for all their insights, even the likes of Facebook and Google still don't have the kind of information statisticians trust to give them a true cross-section of society.

Canada has conducted a regular census almost since its founding. But in the 1970s the federal government introduced a longer census questionnaire with more detailed demographic questions that was sent out to about a fifth of all households countrywide. While everyone still had to complete a short census which merely counted population, it was the "long-form" results that provided the kind of granular data that all levels of government relied on when formulating policies on poverty, transportation, education, and much more.

The most recent long-form census results, however, are from 2006. Canada's Conservative Party scrapped the mandatory questionnaire in 2010, citing privacy concerns, and replaced it with a voluntary National Household Survey that went out in 2011, much to the horror of statisticians.

Put simply, a voluntary survey means you have no idea who's responding and who's opting out, making the data rather unreliable beyond broad strokes.

"If you don't get a representative sample, you're going to get results which don't describe the typical behaviour of the population as a whole"

An attempt to resurrect Canada's mandatory long-form census was voted down in Parliament earlier this month, which means everyone from policymakers and think tanks to academic researchers and charities will have to make the best of the shoddy numbers available to them. But with all the information people broadcast to the world each day—with their web browsers, smartphones, and even their purchases at the grocery store—could "big data" come to the rescue, ending the need for paper forms to be sent to people's homes every few years?

Advertisement

It's an intriguing question, especially since it seems companies such as Facebook, Google, and Amazon know about as much about people as any government statistician ever could. Most major tech firms have in-house research teams, and we know Facebook already closely tracks its user​s' names, locations, and total numbers. Google has published research on improving "web-based data collec​tion" to create representative samples of the general population, and as Motherboard reported in 2014, researchers in Portugal and France have mapped population density using mobile phone records. The use of new technology for finding census-like information is not new.

But researchers say the data harvesting tricks of Silicon Valley are still a long way from replacing a detailed census. And if there's any hope at all, it likely won't come from the private sector.

"The key thing in the long-form census was that you have a very well defined sampling scheme for collecting that information so that you've ensured that you're going to get a representative collection of long forms returned," John Petkau, president of the Statistical Society of Canada and statistics professor at the University of British Columbia, told Motherboard. "If you don't get a representative sample, you're going to get results which don't describe the typical behaviour of the population as a whole, which is what you're trying to do presumably."

Advertisement

Governments already collect reams of data about their citizens each day, from birth and death certificates to educational records to tax returns

Petkau was one of many who urged the government to bring back the mandatory census, even though the voluntary National Household Survey that replaced it in 2011 asks many of the same questions. He says that, to statisticians, it's not even particularly important what the census asks—as long as researchers can be sure the sample is representative of the country at large.

"As statisticians we don't care. We don't care what the questions are. We care about the quality of the data that you're going to get as a result of your sampling scheme," he said. And ultimately, that's not what data mining by private companies is geared towards. Gathering useful insights about poverty or the labor market, or trying to chart the ethnic makeup of a specific community over time—there's no money in that.

But there is an alternative approach to gathering useful statistics that is already in place in countries across the world, one that is arguably more useful for policy makers and researchers than even the mandatory long-form census. Governments already collect reams of data about their citizens each day, from birth and death certificates to educational records to tax returns. Pretty much everything that goes through the government bureaucracy leaves a paper trail, and this "administrative data" can be very valuable if collected properly.

Advertisement

Several European countries, mostly in central and northern parts of the continent, have incredibly detailed sociological data from existing government records, and Finland doesn't even conduct a conventional census anymore, getting a population count exclusively from its administrative data si​nce 1990.

Ron Melchers, who teaches criminology at the University of Ottawa, has worked with administrative data in his native Netherlands and says the amount of granular detail available to researchers is astounding.

"They register everyone at birth, they collect and they link data sets across different purposes throughout people's entire lives," he told Motherboard. "So they don't have just birth records, but also residency records, they have employment records, health records, school records, everything."

Not only is administrative data easier to gather—there's no worry about non-response rates—but it also results in a better understanding of certain subjects. It's very hard to know, for example, what percentage of society has been incarcerated because most people prefer not to divulge that in surveys. By linking data from the corrections system to other government records, however, criminologists get a much clearer idea of what percentage of the population has been behind bars.

"It's much higher than anyone thinks it is," said Melchers.

Statistics Canada, which administers the census, declined to make available any researchers or spokespersons for this article. The statistical agency sa​ys on its website that it already relies on administrative data for some things, but it's unclear how deep that goes.

And while it's tempting to consider that StatsCan could flip a switch and replace the information it lost from the abolished long-form census with existing government data, such a shift would take a long time. European countries that have such sophisticated administrative data regimes in place developed them over decades, sometimes centuries.

"They're wonderful record-keeping systems, but you couldn't create a system like that from nothing," said Melchers, cautioning that it could take as much as three generations before Canada could have anything comparable to Switzerland or the Netherlands.

Cradle-to-grave record-keeping would also represent a much larger intrusion into Canadians' privacy—and privacy, after all, was one of the chief reasons the government cited for ditching the mandatory long-form census in 2010. Besides, the whole thing might just be too efficiently… European.

"You have to work within the traditions of the country that you're in," said Melchers. "In Canada, the US, other countries that have a census in the past, it means that you're looking at a census. There is nothing that can replace a census, quite frankly."