The VICE Channels

    Image: CDC/Flickr

    This Math Model Is Predicting the Ebola Outbreak with Incredible Accuracy

    Written by

    Michael Byrne


    Part of the allure of epidemiology is being able to describe and predict highly dynamic outbreaks with simple, clean mathematical models. But how close can models really get to perfectly mapping the spread of disease? 

    Modeling how disease spreads early in an outbreak is a major challenge as sample sizes remain low and variables high. But a recently-developed method of making short-term outbreak projections called the IDEA model has shown promise, and is even doing an excellent job of tracking the current Ebola outbreak.

    "If validated, the implications of such a finding may be profound," wrote the model's creators in an open-access 2013 paper in PLOS One, "e.g., the ability to project, with a high degree of accuracy, the final size and duration of a seasonal influenza outbreak within 2 weeks of onset."

    The graph above shows how the model is faring with the current Ebola outbreak. So far, it's nearly perfect. If the IDEA model continues to predict the epidemic with the same accuracy, we can expect Ebola to start burning out in December, with a total of 14,000 cases. Currently, according to the CDC there are or have been 8,400. We have a ways to go.

    So how does the model work? A few weeks ago, we discussed the infamous r_0 number—which is used to calculate the transmissibility of a disease in terms of additional infections per infected individual—and a model known as SIR, which describes the powerful dynamics involved in mixing susceptible (S), infected (I), and immune (R, for recovered) segments of a population that's exposed to infection.

    The SIR model is classically used to see how much an infection can grow within a population, with those susceptible becoming infected, and the infected sometimes becoming recovered or immune. (A good explainer example is this model of a potential zombie outbreak.) When combined with r_0, the models can give us the force of an infection.

    Generally, epidemic models grow from the SIR framework, with each one adding a new "compartment." For example, the SEIR model adds an "E" for a population group that's been exposed, and is incubating the pathogen, but isn't yet infectious—such as when US Ebola patient zero Thomas Eric Duncan boarded his plane from Liberia in September. 

    The MSIR model adds "M," a group with natural, born-with-it immunity. Meanwhile, the SIS model actually removes the immune group entirely from the equations, a situation that fits the common cold and flu, in which being infected once offers no future protection.

    There are several other variations on the basic compartmental model, but this is hardly the only modeling strategy out there. Both generally and as a way of informing the models above, we might turn to the IDEA model. 

    IDEA stands for "incidence decay and exponential adjustment." Yes, finally, we get to really talk about exponential things in the proper sense, rather than the usual casual redefinition of the term to mean "a lot."

    One of the IDEA scheme's creators, Amy Greer, writes that the model is "based on the idea that we could use simple types of public health surveillance data and turn that information into reliably accurate projections of what might happen in the outbreak in the short-term."

    The model attempts to make up for the usual shortcomings of the r_0 number, which, according to the IDEA creators, often fails to accurately account for epidemic control efforts. 

    As with the compartmental models, r_0 is at its best at the very beginning of an outbreak using sets of initial values. In an outbreak, things change fast, however, and public health responses can add a ton of variables to the mix. 

    Again, in the case of Ebola, how could a research have modeled the way misinformation and protests have undermined quarantine efforts? This is where IDEA is designed to be most effective.

    If you remember, r_0 is technically defined as the average number of secondary infections that can be expected to result from one primary infection. In other words, this is how many people that each infected person can expect to transmit the disease to before they, the primary case, become not-infectious. 

    Ebola sits at around r_0 = 1.5 in the United States and closer to 2 in West Africa, where the disease has a higher chance of spreading. Keep in mind the 1.5 is an initial value and as more control measures are taken, it should decline.

    Measuring the decline is where things get murky, according to Greer. Her model uses a new term d to modify r_0 like this:

    The main thing here is the d, which is a factor representing some discount function that changes through time, so named because it resembles discounting in financial models. Here it's meant to represent the efforts taken to control the epidemic, vaccinations and quarantines etc. The larger d gets, the smaller the I result, which is the number of total infected individuals.

    Using this first I, we can find out how I changes through time, given by this equation, where the Ret at time 0 is just r_0:

    So, multiplying the R value at a given time, which is the Ret, by the first equation we got using d will tell us how many infected individuals we can expect at the next time interval (days, probably). 

    All that is to say that the IDEA model is a much more dynamic way to look at transmissibility as it's continuously being modified by the various control mechanisms we might put into place to limit the epidemic or, rather, the observed effects of them.

    Algebraically twisting around the equations above, along with other equations in the model that predict changes in an epidemic's immune and susceptible populations, gives us some other useful predictions: The expected time an epidemic is likely to stop growing, an estimated maximum number of total infected individuals, and so on. The model can also give epidemiologists a way of determining how effective their control measures are.

    Greer and her team tested the model out on data from an H1N1 outbreak in Nunavet, Canada (a reasonably isolated population). You can see the results below. Not bad: the models tracked the observed data pretty well. (Note that SI refers to how many different time intervals, the ts above, are calculated.)

    Image: Greer et al

    In simulated epidemics, the researchers found that their model did very well with low or moderately low starting r_0 values, which SIR can have a difficult time with. According to Greer and her team, the IDEA prediction was a near-perfect fit.

    "We found that best-fit projections for the IDEA model for disease dynamic systems with low or intermediate r_0 were exceedingly good, with parameters derived within 3–4 generations able to project the full extent of simulated epidemics with remarkable accuracy," the team concluded in their PLOS One paper.