How a Frozen Neutrino Observatory Grapples with Staggering Amounts of Data

Sensors collect terabytes of raw data every day in an attempt to discover the source of ultrahigh-energy cosmic rays.

Deep beneath the Antarctic ice sheet, sensors buried in a billion tons of ice—a cubic kilometer of frozen H2O—are searching for neutrinos. Not just any kind of neutrino, though. The IceCube South Pole Neutrino Observatory wants to discover the sources of ultrahigh-energy cosmic rays and thus solve one of science's oldest mysteries.

Just one problem. These kinds of neutrinos are really difficult to detect.

Drilled and left to freeze in the ice between a depth of 1,450m and 2,450m beneath the surface of the South Pole, IceCube sensors collects terabytes of raw data every day.

The deployment of each of the 86 IceCube strings lasted about 11 hours. In each one, 60 sensors (called DOMs) had to be quickly installed before the ice completely froze around them. Photo: IceCube/NSF

But how does that data get processed and analyzed? As IceCube researcher Nathan Whitehorn explained, it isn't easy.

"We collect...one neutrino from the atmosphere every ~ 10 minutes that we sort of care about, and one neutrino per month that comes from an astrophysical sources that we care about a very great deal," Whitehorn wrote in an email. "Each particle interaction takes about 4 microseconds, so we have to sift through data to find the 50 microseconds a year of data we actually care about."

"We have to sift through data to find the 50 microseconds a year of data we actually care about."

Because IceCube can't see satellites in geosynchronous orbit from the pole, internet coverage only lasts for six hours a day, Whitehorn explained.

The raw data is stored on tape at the pole, and a 400-core cluster makes a first pass at the data to cut it down to around 100GB/day.

An inside look at IceCube's data center at the South Pole. Image: IceCube Collaboration

During the internet window, IceCube sends 100GB/day via NASA's TDRS satellite system to the University of Wisconsin, Madison.

"South Pole systems also try to monitor autonomously for really interesting things and send satellite phone SMS messages if they think they've got something," Whitehorn said.

Cosmic rays were first discovered in 1912 by Austrian physicist Victor Hess, whose ascent in a hot-air balloon during a solar eclipse that year proved their existence.

Scientists have been working to better understand Hess's discovery ever since.

Cosmic rays bounce around a lot in space, so they don't point back to their sources. That's part of the reason their true origin has remained a mystery for so long. But their sources also produce high-energy neutrinos which, as it happens, fly straight.

"Nothing else besides these cosmic ray interactions can produce these kinds of high-energy neutrinos, so detecting a neutrino source is a totally unambiguous detection of a cosmic ray source," Whitehorn wrote.

The catch is that neutrinos are extremely difficult to detect.

Enter IceCube.

The IceCube data sets arrive at UW-Madison, where it gets processed further, increasing the size by roughly a factor of three.

"If the filtered data from the Pole amounts to ~36TB/year [this number was so incredible we had to double check it was not a typo -Ed.], the processed data amounts to near 100TB/year." Gonzalo Merino, the IceCube computing facilities manager at UW-Madison, wrote in an unencrypted email.

This data gets stored at UW-Madison, Merino wrote, and "all the data taken since the start of the detector construction is kept on disk so that it can be all analyzed in one go."

In total, the IceCube project is storing around 3.5 petabytes (that's around 3.5 million gigabytes, give or take) in the UW-Madison data center as of this writing.

Victor Hess discovered cosmic rays less than a year after Amundsen became the first person to reach the South Pole. Image: Public domain

A 4000-CPU dedicated local cluster crunches the numbers. Their storage system has to handle typical loads of "1-5GB/sec of sustained transfer levels, with thousands of connections in parallel," Merino explained.

"Keeping this data storage and data access services running with high performance and high availability I would say are one of our main challenges in the offline IceCube computing facilities," he added.

Because the IceCube data is unique and irreplaceable, the project focuses not just on performance but also ensuring the integrity of the data in the long term. What if someone comes along in twenty years with a great idea no one's thought of yet? So the entire data set is stored in multi-petabyte off-site tape backup storage facilities at two different locations around the world.

As for scientists? The hunt for the source of cosmic rays continues.

"There are still very few detections, so we are still crossing out possibilities for cosmic-ray acceleration rather than confirming them, but this is nonetheless the first direct view of the accelerators that anyone has ever had," Whitehorn said. "And more neutrinos show up every month."