FYI.

This story is over 5 years old.

Tech

​The Battle to Open Up Britain's Rail Data

When it comes to tracking delays to claim compensation, there's open data and "open" data.

Think Britain's rail network can't be any more frustrating? Cancelled trains are common for commuters, but attempts to track delays and make it easier for riders to collect their owed compensation are being hampered by locked-down data.

Britain's rail network was privatised from 1993, with British Rail broken up into many smaller companies. Network Rail owns the infrastructure, with private firms allowed to bid to operate trains. Those latter companies banded together to create National Rail Enquiries (NRE), which releases timetables and sells tickets for all the operators, so you can book without having to go to each individual company.

Advertisement

NRE has released large swathes o​f data under its own version of the Open Government Licence, meaning developers can easily chuck together apps showing when the next train departs Paddington or the cheapest route to Wigan.

When a train is cancelled in advance, such as the annual Christmas shutdown, or if particularly bad weather is expected, that data is also shared. But there's a slice that some argue isn't as open as claimed: the super short-term cancellations, when everyone's waiting on the platform and the train doesn't arrive, such as if a driver didn't show up, an incoming service is late, or there's leaves on the tracks—the sort of last-minute troubles that most piss off travellers.

The majority of passengers on the train who are entitled to a rebate don't claim that rebate

That data does exist; it's collected by National Rail Enquiries (NRE) via a company called Nexus Alpha under a system dubbed Tyrell, and was released to developers in a more open feed called Darwin earlier thi​s year. Customers can see last-minute cancellations on the National Rail Enquiries website, but one company has said it's been unable to make full use of the data because it's locked down under a licence with strict terms—particularly a ban on disputing the veracity of it, according to Jonathan Raper, founder of data firm Transport API. As the term​s state, you cannot "use Darwin Information in any End User Product that displays different train arrival and/or departure predictions to those derived from Darwin."

Advertisement

While this may seem a niche data set, if it were released Transport API's developer customers could build an app that would alert users if they were owed a rebate on their ticket because their train was late or cancelled—which would surely be downloaded by every commuter in the country. "The industry is frankly a little bit terrified of this," Raper said. "The majority of passengers on the train who are entitled to a rebate don't claim that rebate. So there's a potential big downside for [train operators]."

The problem is a fine line between what's labelled as open and what's truly open. An NRE spokeswoman stressed the Darwin data is not under a commercial licence and that "the terms are comparable to those used by both Network Rail and TfL [Transport for London] in their open data offering." Raper disagreed, saying it's not "open data comparable" because of the restriction in the terms.​

NRE said the feed is being used by other software developers, but "high demand" meant it couldn't be supplied to everyone. "We asked for Darwin Timetable access when it became available in June and were told it was not available," Raper countered. "So if this feed is being used by 'a number of software developers' we have been singled out for different treatment, possibly because we are a competitor." It would make more sense, he argued, to simply release the raw data, without the restrictions that come with locking it down in the Darwin feed.

Advertisement

You can't get the full benefit of all data on transport when key pieces are missing

The we-said, they-said data disagreement sends Raper into a 20-minute rant. "It's a significant commercial obstacle to us not to have access to the Tyrell cancellations," he told me. "That sort of cancellation information should have the widest dissemination, because that's the one thing everybody wants to know."

Jeni Tennison, the technical director at the Open Data Institute, agreed the cancellation data was "really important" to train travellers, not only for real-time updates but also for long-term analysis of how many cancellations there are for a particular line. "You can't get the full benefit of all data on transport when key pieces are missing," she told me, comparing it to a jigsaw puzzle.

She told me part of the problem is when public services are provided by the private sector; such firms claim that selling data is a revenue stream for them, and ask for public-funded subsidies to make it open. For example, the state-owned Ordnance Survey mapping company has made much of its data open, but takes a £10 million annual subsidy to do so, according to The Indepen​dent. "It's particularly frustrating when organisations aren't making much money from it," Tennison added, noting that the costs of selling data—including lawyers for licensing and enforcing terms—often outstrip any revenue.

Tennison hopes that's not the case. "What we see and hope for long term is a future where there's more of an open-data culture," she said—where open data is published all the time, and is seen as a "thing that people do." As happened with the web, any organisations that resist such a shift will see alternatives pop up to replace them, she predicted, pointing to crowd-sourced OpenStre​etMap, which offered free maps when Ordnance Survey was still charging.

It's easy to imagine someone developing a scraper to look for pissed-off tweets from delayed passengers, but it'd be simpler to get the data direct in an open format—or for trains to always run on time. Fire up Twitter, because neither looks likely at the moment.