How Bad Software Leads to Bad Science
A new survey of UK scientists indicates that some researchers are building software without any training.
Software that can crunch data faster than any researcher is as much a part of science these days as petri dishes and the occasional bout of megalomania. Researchers are even designing their own bespoke programs, but not every scientist is a programmer, and bad software produces bad science.
A new survey of 417 randomly selected UK researchers, published today by the Software Sustainability Institute (SSI), reports that 70 percent of respondents believe they could not practically continue their research without the aid of software. 56 percent of respondents design their own software, and 20 percent of those scientists do so without any training in software engineering whatsoever.
"It's a terrible concern, because you can work your way through software development—researchers are intelligent people, they can work this stuff out—but you can't build software that is reliable," Simon Hettrick, deputy director of the SSI, told me. "If you're producing your results through software, and your software doesn't produce reproducible results, then your research results aren't reproducible."
Bespoke software is used at nearly all levels of science, Hettrick told me. Something as simple as generating a graph might need a specialized program, all the way up to aggregating massive amounts of data for scientists to dig through and make connections.
Problems arise when that software is designed by researchers who really don't know what they're doing when it comes to coding. A single mistake in the code can lead to a result that appears innocuous enough, but is actually incorrect.
In 2006, five papers that appeared in Science and two other journals were retracted due to an error produced by a homemade piece of software. The research group, led by Geoffrey Chang, thought they had identified a new protein structure, but their discovery was only made possible through a mix-up in their bespoke data analysis program that effectively inverted their results.
Poorly designed software being passed between research groups and used uncritically is also a concern. A 2013 study on the use of modeling software among researchers who study species distributions found that just eight percent of the surveyed scientists validated their software of choice against other methods. Many of these programs were recommended by colleagues or picked up by word-of-mouth.
SOFTWARE LETS YOU DO SO MUCH MORE IN THE AMOUNT OF TIME THAT WOULD ALLOW THE AVERAGE HUMAN TO CONDUCT AN ENTIRE RESEARCH CAREER
The use of powerful software across nearly all areas of research is poised to only increase in the near future. The ability to crunch more numbers, more accurately, and with greater speed, is a boon to researchers who can spend more of their time thinking creatively about the data their work produces.
"Software lets you do so much more in the amount of time that would allow the average human to conduct an entire research career; it's about empowering researchers to do more," Hettrick told me.
There's no doubt that awareness about the use of poorly designed software has risen in the scientific community as a result of these studies and professional horror shows. Yet, the recent SSI survey indicates that some researchers—potentially a non-trivial number—are still relatively ignorant regarding their own digital tools.
"Training for researchers is very important," Hettrick said. "We think that software training, a basic level of software engineering and development, should be in all doctoral schools so that they're producing a research community with a basic understanding of how software works."