FYI.

This story is over 5 years old.

Tech

Astronomy’s Looming Big Data Problem Has a New Solution

Scientists offer an algorithm capable of matching human pattern recognition skills.
​SKA artist's impression. Image: SKA Organisation

When completed a decade from now, the Square Kilometer Array will consist of thousands of small antennas concentrated in South Africa, Australia, and New Zealand, spanning a distance of at least 3,000 kilometers. Together, the array will add up to be roughly equal to a single telescope with a total collection area of approximately a square kilometer. The SKA will be 50 times more precise than any other radio telescope in existence, offering the highest resolutions in all of astronomy. And it will be blazing fast, offering the ability to survey the sky 10,000 times quicker than ever before.

Advertisement

There's a catch, however. A big one. The Square Kilometer Array is as much an experiment in data as it is astronomy. To count for anything, all of that mess above needs to be linked; otherwise, it's just a spew of independent radio antennae feeds. This will require intense central supercomputers to sort out and process the data deluge, but also an unprecedented network infrastructure. The SKA organization has estimated that the amount of data needing to be shuttled around the Array and its swarms of instruments will be roughly 10 times that of which is currently coursing around the global internet.

As described in a study out this week in the The Astronomical Journal, a group of researchers at the University of Wisconsin has come up with a new way of handling the forthcoming data hurricane. In particular, their solution has to do with detecting hydrogen.

The Square Kilometer Array is as much an experiment in data as it is astronomy

"The Square Kilometer Array (SKA) and its pathfinder telescopes … will push radio astrophysics into a new era of 'big spectral data' by providing scientists with millions of high spectral resolution, high-sensitivity radio emission and absorption spectra probing lines of sight through the Milky Way and neighboring galaxies," the group, led by UW postdoc fellow Robert Linder, wrote. "This infusion of data promises to revolutionize our understanding of the neutral ISM. However, these new data will bring new challenges in data interpretation."

Advertisement

The problem, in Linder's words, can be reduced to the question, "how many clouds are behind the pixel?" That is, we can look at some pixel of an image and see hydrogen, but it's exceedingly difficult for a computer to look at that pixel and say if it's just a single hydrogen cloud layer or if it's many. People can usually sort this out because we're both visual and adept at pattern recognition, but, clearly, the scale of the SKA operation requires computer automation.

This might come in the form of "autonomous gaussian decomposition," which is a new algorithm devised by the UW group. It uses machine learning to guide a computer program through a series of best guesses or optimized guesses as it decomposes the signal/pixel into its different spectral components, giving detailed probabilistic information about each one.

​Arecibo Observatory. Image: ​rice.edu

Linder and his group tested out the algorithm on some IRL astrophysical data (courtesy of the Arecibo Observatory, above) using a human interpreter as a control—remember, humans are good at this already—and comparing the results. What they found was that the algorithm was able to interpret the pixel data about as well as the human subject. Any lag or noise on the part of the algorithm would be smoothed out given a large enough data set. (And, again, the SKA data set is the very definition of large.)

"We are looking at the Milky Way, because that's what we can study in the greatest detail," Lindner offered in a statement. "But when astronomers study extremely distant parts of the universe, they need to assume certain things about gas and star formation, and the Milky Way is the only place we can get good numbers on that."

And, thanks to the algorithm, "suddenly we are not time-limited," he said. "Let's take the whole survey from SKA. Even if each pixel is not quite as precise, maybe, as a human calculation, we can do a thousand or a million times more pixels, and so that averages out in our favor."