Clear Sky Science · en
Three decades of simulating global temperature patterns with coupled global climate models
Why Better Climate Maps Matter
When we hear about climate change, we often see a single global number: how much the planet has warmed. But in everyday life, what really matters is where and how temperatures change around the world. This study looks at how well climate models reproduce the detailed global pattern of surface temperatures over the past three decades of model development, from early experiments in the 1990s to today’s cutting‑edge kilometer‑scale simulations. Understanding this progress tells us how much confidence we can place in the maps of future climate that guide planning, adaptation, and policy.

From Rough Sketches to Finer Pictures
Modern climate models simulate the atmosphere, oceans, land, and sea ice on a grid that wraps around the globe. In the 1990s these grids were relatively coarse, so small‑scale features like ocean eddies or mountain valleys had to be heavily simplified. Since then, computing power and scientific understanding have grown dramatically, allowing models to run with much finer spacing and more detailed physics. The authors focus on one simple but revealing question: how closely do different generations of models reproduce the 20‑year average pattern of near‑surface air temperature compared with observation‑based datasets?
Judging Models with Many Yardsticks
To evaluate performance, the study compares 176 climate model runs against 10 independent, observation‑based datasets that blend weather measurements, satellites, and other sources. Instead of just looking at global averages, the authors examine how similar the temperature pattern is at each location on Earth. A model scores well when its local temperature falls within the spread of the reference datasets. Over time, the fraction of Earth’s surface where models match this reference range has increased from about one quarter for early models to over one third for the latest Coupled Model Intercomparison Project (CMIP6) generation. A few newer kilometer‑scale models—especially the IFS‑FESOM system—match or even surpass the best older models, coming close to how well the observation‑based datasets agree with one another.
Persistent Trouble Spots on the Planet
Even as models improve, certain regions remain stubbornly difficult to simulate. The northern North Atlantic, the Southern Ocean, and areas with low marine clouds along the eastern edges of major ocean basins show large, long‑lasting temperature biases across many model generations. For example, several kilometer‑scale simulations are still too cold in parts of the North Atlantic, likely linked to how sea ice interacts with the ocean. These persistent hot and cold spots point to underlying physical processes that are still not fully captured and also represent areas where future progress could yield particularly large gains in realism.

Why the Choice of Reference Changes the Score
A key finding is that as models get better, differences between observational datasets start to matter more. Earlier work often judged models against a single reference product, quietly assuming that most of the discrepancy came from model error. By comparing each model to all 10 references separately, the authors show that for the newest, highest‑performing simulations, up to 40% of the apparent error can come from which reference is chosen rather than from the model itself. Even switching between two widely used reanalyses—ERA‑Interim and its successor ERA5—can systematically favor older or newer model generations. This means that relying on a single dataset can give a misleading picture of which models are “best.”
Sharper Grids Are Helpful but Not Enough
Higher spatial resolution—using a finer grid—generally leads to better temperature patterns when models are carefully tuned for that resolution. Across the large CMIP archive, the authors find a clear tendency for models with finer grids to produce smaller temperature errors. However, when the same model is simply run at higher resolution without retuning, performance can stagnate or even worsen. This is evident in a special set of high‑resolution experiments where extra tuning was intentionally avoided: five out of six model pairs performed worse at finer resolution. In contrast, some kilometer‑scale prototypes already compete with or beat the best traditional models despite only limited tuning, underscoring both their promise and the work still needed to fully exploit their potential.
What This Means for Our Climate Future
Put simply, the study shows that climate models have steadily become better at mapping the planet’s temperature pattern, but the very best models have not always leapt ahead from one generation to the next. New kilometer‑scale simulations demonstrate that it is possible to push beyond today’s standards, yet fine grid spacing by itself is no magic fix. Careful model design, adjustment, and testing remain essential. At the same time, the growing influence of observational uncertainty means that model evaluations must account for differences among reference datasets instead of trusting just one. Together, these insights help scientists build more reliable “digital twins” of Earth—virtual laboratories that can more faithfully explore the climates we may face in the decades to come.
Citation: Brunner, L., Ghosh, R., Haimberger, L. et al. Three decades of simulating global temperature patterns with coupled global climate models. Commun Earth Environ 7, 400 (2026). https://doi.org/10.1038/s43247-026-03497-w
Keywords: climate models, global temperature patterns, kilometer-scale modeling, model evaluation, Earth system simulation