Clear Sky Science · en
WxC-Bench: A Novel Dataset for Weather and Climate Downstream Tasks
Why Smarter Weather Data Matters
From bumpy airplane rides to flooding rains and strengthening hurricanes, the atmosphere affects daily life in countless ways. In recent years, artificial intelligence has begun to forecast weather faster and sometimes more accurately than traditional computer models. But these powerful systems are usually trained for just one job at a time and depend on painstakingly prepared data. This paper introduces WxC-Bench, a new open dataset built to give AI a richer, cleaner view of our atmosphere so that a single model can learn many different weather and climate tasks instead of just one.

Bringing Many Kinds of Weather Data Together
WxC-Bench (short for Weather and Climate Bench) starts from a simple idea: if we want general-purpose AI for Earth’s atmosphere, we need a single, well-organized place where many kinds of weather data and problems are brought together. Today’s leading AI weather systems typically focus on medium-range forecasting—predicting conditions days ahead—using one large pool of data. WxC-Bench goes further. It gathers information from satellites, long-running weather reanalyses, high-resolution forecast models, hurricane archives, and even pilots’ reports from the cockpit. The authors clean and standardize these sources so they can be used directly by machine-learning tools, reducing the time and expertise needed to prepare data for new studies.
Six Real-World Weather Problems in One Bench
Rather than centering on a single forecast skill score, WxC-Bench is organized around six practical tasks that span different time and space scales. At one extreme is aviation turbulence, a short-lived, small-scale hazard that can jolt aircraft without warning. Here, the dataset links daily snapshots of the atmosphere over the United States to reports filed by pilots, allowing AI models to learn where rough air tends to occur. Another task focuses on gravity waves—ripples in air that move energy and momentum between layers of the atmosphere and are notoriously hard to represent in climate models. For this, WxC-Bench provides global fields of winds and temperatures, along with the subtle momentum fluxes those waves carry, giving AI a rare training ground for processes that traditional models must approximate.

From Historic Patterns to Future Rain and Storms
Other WxC-Bench tasks look outward in time and space. A weather “analog” dataset helps AI find past situations that resemble a current pattern, the way a human forecaster recalls past storms. The authors slice a global reanalysis into hundreds of overlapping tiles, so models can search for similar pressure or temperature patterns either locally or worldwide. For longer horizons, a precipitation dataset asks models to predict daily rainfall up to several weeks ahead—precisely the time window that is crucial for farming and water planning, yet where today’s forecasts often fail. This collection uses almost forty years of satellite observations and best-available rainfall estimates, letting AI learn how large-scale cloud patterns today relate to rain many days later.
Hurricanes, Flight Safety, and Plain-Language Forecasts
WxC-Bench also targets high-impact extremes and communication. A hurricane dataset compiles more than four decades of storm tracks and intensities from all major ocean basins, capturing everything from weak tropical storms to the most destructive Category 5 systems. By combining so many regions and environments, it lets AI explore which conditions favor rapid intensification or unusual paths. Finally, a natural-language task pairs gridded weather maps over the United States with human-written forecast discussions. After careful text cleaning—removing clutter like punctuation noise and repeated filler words—this part of the bench trains models to turn complex maps into clear written summaries, moving AI one step closer to drafting human-friendly forecasts.
Testing the Data with Baseline AI Models
To show that these curated datasets are truly ready for machine learning, the authors run a series of baseline models for each task. Simple neural networks can already distinguish turbulent from calm regions better than older methods; a specialized network can reproduce key patterns of gravity-wave effects around mountain ranges and storm tracks; an image-search model successfully finds past weather maps that resemble a given pattern; an auto-regressive system trained on satellite data can predict rainfall weeks ahead with skill comparable to respected international forecast centers at longer lead times. For hurricanes and forecast text, modern architectures such as FourCastNet and vision–language models demonstrate that the data can support realistic storm tracking and reasonable written summaries, even if there is room for improvement.
What This Means for Future Weather AI
Viewed together, WxC-Bench is less a single dataset than a toolbox for building and testing the next generation of weather and climate AI. By covering problems from seconds to weeks, and from local turbulence to global storm statistics and plain-language reports, it challenges AI systems to generalize beyond one narrow job. Because WxC-Bench is openly available, with code and a Python package for easy access, researchers can benchmark new foundation models, compare them fairly, and gradually expand the collection with new tasks. For a lay reader, the bottom line is that better-organized data like WxC-Bench brings us closer to AI systems that can foresee dangerous storms earlier, guide safer flights, support water and farm planning, and explain tomorrow’s weather in everyday language.
Citation: Shinde, R., Ankur, K., Phillips, C.E. et al. WxC-Bench: A Novel Dataset for Weather and Climate Downstream Tasks. Sci Data 13, 596 (2026). https://doi.org/10.1038/s41597-026-06839-7
Keywords: artificial intelligence, weather forecasting, climate data, hurricanes, precipitation prediction