We're Modeling Networks All Wrong

... according to some Notre Dame computer scientists and zebra mussels.

Current network analysis algorithms are missing or skipping over some crucial details, argues a team of computer science researchers from the University of Notre Dame in the current issue of Science Advances. The result, they say, is a widespread pattern of oversimplification profound enough to lead to erroneous results and interpretations. Naturally, the scientists have developed a new and improved alternative capable of spotting missed interdependencies.

The network problem (and solution) can be seen in the global spread of zebra mussels, an archetypal invasive species responsible for widespread devastation of native aquatic ecosystems from Sweden to the American Great Lakes, in addition to inflicting heavy damages on boats, hydroelectric facilities, and really any other underwater infrastructure that may cross their paths.

The spread of zebra mussels happens to make for a readymade real-world network example. Because they get from place to place via unwitting ship traffic, the global movements of zebra mussels nicely mirror the global shipping network. As a network graph, we would have nodes representing different ports with edges (or connections between nodes) representing shipping routes. The "weights" of the edges/routes can be viewed as measures of the interaction strength between different nodes/ports.

In a first-order network, we would find the weight of an edge connecting a pair of ports by adding up all of the trips made by ships between those ports. These weights can then be used to make predictions about future trips. So, if a route between Los Angeles and Singapore has a lot more shipping traffic than a route between Seattle and Singapore, it seems reasonable enough to predict that any given ship leaving Singapore is more likely to be going to Los Angeles.

And, indeed, this is how most network models work, the Notre Dame researchers explain—by making predictions based on port pairs (or node pairs). This seems reasonable enough, but it winds up missing some crucial detail.

"In reality, the global shipping data indicate that a ship's previous stops before arriving at Singapore influence the ship's next movement: the ship is more likely to continue on to Los Angeles if it came from Shanghai and more likely to go to Seattle if it came from Tokyo," the current study notes. "A first-order network representation fails to capture important information like this because, in every step, the flow of traffic on the network is simply aggregated and mixed."

"As a consequence, trajectories simulated on the first-order network do not follow true ship movement patterns."

Image: Chawla et al

The alternative offered in the current paper is to view different node-ports as collections of origin-destination pairs. So, instead of looking at Singpore as a single place with connections to a bunch of other places, Singapore is a pool of pairs consisting of Singapore itself coupled to all of the other places. The figure above makes it pretty clear.

In the case of zebra mussels, this is critical because viewing networks as mere combinations of singular ports and edge-routes connecting them skips over a crucial fact: zebra mussels are able to spread to the waters of intermediate ports because ballast water is always being exchanged between the ship (where zebra mussels are stowing away) and the surrounding water. This seems obvious, but it's not really accounted for by first-order networks, according to the researchers.

And this goes well beyond networks of invasive species, extending to vehicle and human movements, email correspondence, web browsing, conversations, stock markets, etc. All represented imperfectly.

In response, the researchers have crafted an algorithm for higher-order network (HON) representations, which is what's visualized above.

"We demonstrate that our novel network representation is more accurate in representing the true movement patterns in data in comparison with the conventional first-order network or the fixed second-order network," the paper concludes. "For example, when using HON instead of a first-order network to represent the global shipping data, the accuracy is doubled when simulating a ship's next movement on the network and is higher by one magnitude when simulating three steps, because the higher-order nodes and edges in HON can provide more detailed guidance for simulated movements."

Cool. The next steps are using the algorithm to detect anomalies in real-world networks and further refining it by reducing the amount of parameters needed to make it run. At the very least, we all learned a thing about network modeling.