Clear Sky Science · en
A disease-agnostic approach to ensemble learning for infectious disease forecasting
Why better disease forecasts matter
When a new infectious disease appears, public health officials must make rapid choices about vaccines, hospital capacity, and social measures using only a few weeks of data. Forecasts from mathematical and computer models guide these decisions, but no single model is reliable in every situation. This paper introduces a way to combine many different forecasting approaches into one smarter forecast that can work even when a disease is new and historical data are scarce.
Blending many forecasting tools
Scientists often improve predictions by forming an ensemble, a combined forecast from several individual models. A simple method gives each model equal influence, which is safe but can be wasteful when some models clearly perform better than others. More sophisticated methods try to learn which models deserve more weight from past performance, but they usually require years of detailed data for a single disease. That makes them poorly suited to fast-moving outbreaks like COVID-19, where such records do not yet exist.

A data-free way to tune the mix
The authors propose a new framework, called epiFFORMA, that learns how to weight models without relying on historical records for a specific disease. Instead, they generate a large library of realistic but fully synthetic outbreak curves using standard disease spread equations. For each synthetic outbreak, they run nine common forecasting models and record which ones do best at different points in the trajectory. They also translate each outbreak curve into a compact set of descriptive features, such as how quickly cases are changing, how close the series is to a recent peak, and how strong seasonal patterns appear.
Teaching a meta-model to choose
Using this synthetic library, the team trains a separate machine learning system to connect time-series features with good choices of model weights. Rather than learning to favor specific named models, epiFFORMA learns patterns such as when to trust forecasts that are near the middle of all model predictions or when to down-weight extreme high or low forecasts. Once trained, this meta-model can be applied to a real outbreak: features are computed from the observed case counts, each component model produces a short-term forecast, and epiFFORMA assigns weights to blend them into a single prediction.

How well the method performs
The researchers tested epiFFORMA on 11 large datasets covering diseases like COVID-19, influenza-like illness, dengue, measles, mumps, polio, rubella, smallpox, and chikungunya, across many regions and years. They compared three options: each individual model on its own, a simple equal-weight average, and the epiFFORMA combination. Across standard measures of error used in disease forecasting, epiFFORMA was on average more accurate than equal weighting and than most individual models. It especially improved forecasts just after case counts peaked or when cases began to surge, situations where some models systematically over- or under-reacted. Even in the few disease settings where epiFFORMA did not win outright, its performance was very close to the best alternative.
What this means for future outbreaks
To a non-expert, the main message is that the authors have built a way to pre-train an outbreak “forecast combiner” using simulated epidemics so that it is ready when the next real threat appears. Because epiFFORMA does not require detailed past data for the specific pathogen, it can be deployed early in an emerging epidemic and still offer an advantage over simply averaging existing models. This approach offers health agencies a more flexible and generally reliable forecasting tool that can adapt to many diseases while keeping the stability and safety of traditional ensemble forecasts.
Citation: Murph, A.C., Beesley, L.J., Gibson, G.C. et al. A disease-agnostic approach to ensemble learning for infectious disease forecasting. Nat Commun 17, 4255 (2026). https://doi.org/10.1038/s41467-026-70937-8
Keywords: infectious disease forecasting, ensemble modeling, synthetic outbreak data, emerging epidemics, machine learning