A new paper casts doubt on 40,000 studies.
Image: Neil Conway/Flickr
It's often said, a bit hyperbolically perhaps, that the human brain is the most complex structure in the known universe. For thousands of years, we've wanted to peep inside and see what it's doing—through trepanation, through CT scans, or through a technique that's been in favour over the last two decades or so: functional magnetic resonance imaging (fMRI) which measures blood flow (and, indirectly, activity) inside the brain.
fMRI is an amazing tool that's spawned a whole new field of research, not to mention an endless series of studies on how the brain works. It's allowed researchers to draw conclusions about drug addiction, about human empathy for robots, even about how we tend to respond to poetry and prose, just a few examples on a very long list.
But there's a problem. Potentially, it's a very big one: Statistical methods on which these studies are based could be seriously flawed, according to Anders Eklund of Linköping University in Sweden. In a new paper, published in PNAS, Eklund and co-authors reviewed methods commonly used in fMRI studies, testing them with a large set of data from real humans. They looked at three software systems used in fMRI analysis, and found that these could result in false positive rates of up to 70 percent—potentially indicating activity in the brain where there was none. They expected to see a false positive rate of 5 percent.
This calls into question some 40,000 fMRI papers that have been published since 1992, the paper notes. In other words, it's a red flag for an entire field of research.
"Despite the popularity of fMRI as a tool for studying brain function, the statistical methods used have rarely been validated using real data," the paper states. Validations have been done instead with simulated data, but that isn't really the same thing—and can't perfectly mirror the "noise that arises from a living human subject in an MR scanner," it continues.
It's impossible to know how many of some 40,000 studies are potentially flawed
In the study, authors looked at resting-state fMRI data from 499 healthy people, taken from various databases around the world. They split them up into groups of 20, and compared them against each other, making a total of three million comparisons of randomly selected groups. Among these healthy controls, "you shouldn't find any differences," Eklund told me—not beyond the expected 5 percent false positive rate, anyway. But the difference was often much greater than that, meaning that data could be indicating positives where there were none.
One of the problems they pinpointed was a bug in the software that's apparently been there for 15 years. (It was fixed in May 2015, the paper says, during the time when the study manuscript was being prepared.) But a more insidious problem, Eklund told me, is that it's been very difficult to validate methods until now, and this might be the first time they've really been tested rigorously like this. Yet countless researchers have relied on them.
One major reason, he argued, boils down to cost: fMRI scans are notoriously expensive. (Performing a scan can cost upwards of USD $600 an hour, and the price tag for a single machine can run as much as $3 million.) That makes it hard for researchers to perform large-scale studies with lots of patients, so they "normally have [just] 20 or 30 subjects," Eklund said.
For this paper, he found a way around that problem: Over the past few years, some groups have started sharing their fMRI data for free. These data-sharing initiatives have made it possible to perform statistical analysis on lots of data, the paper says. "I think we saved maybe $1 million just by downloading [patient data] for free," he told me.
Another reason it's been hard to validate is that, until recently, computers haven't been up to the task—at least, not quickly enough to be very useful. "It could have taken a single computer maybe 10 or 15 years to run this analysis," he said, "but today, it's possible to use a graphics card." That lowered the processing time "from 10 years to 20 days."
Eklund must have realized that his results would make waves, because before the paper came out in PNAS, he shared his findings on pre-print servers, opening them up to any objections or double-checking from other researchers in the field.
He also sent his conclusions to the makers of the three software packages, "and said this is what we have done, this is the code, and please check we have done everything correctly," he said. So far, one has introduced changes, while another reviewed the team's work and published a response here, he noted.
It's impossible to know exactly how many of some 40,000 studies using fMRI are potentially flawed, and it's not feasible to redo all of them, the paper says. The authors call on the fMRI community to focus on validating methods that are in use today. Even so, these findings come at a difficult time: Scientists are grappling with a reproducibility crisis, in which they're finding that results can't always be replicated. The peer review system is facing its own scrutiny.
Science exists through a series of checks and balances. "We are always revising and correcting work," said Eklund, who sees his analysis as part of that process.
If anything, this latest finding makes a powerful argument that a great many more researchers should share their data for free: without this dataset, Eklund's analysis, and whatever changes might arise from it, wouldn't have been possible.