Clear Sky Science · en

A Benchmark Dataset for Satellite-Based Estimation and Detection of Rain

· Back to index

Why Watching Rain from Space Matters

Rain shapes our harvests, fills our reservoirs, and fuels dangerous floods and landslides. Yet, surprisingly, we still do not know exactly how much rain is falling everywhere on Earth at any given time. Ground instruments are sparse over oceans and in many countries, and even modern satellites see only part of the picture. This article introduces SatRain, a new global benchmark dataset designed to help the scientific and tech communities build and fairly compare artificial intelligence (AI) methods that estimate rain from space. Better tools for watching rain from orbit can improve weather warnings, water management, and our understanding of how climate change is altering storms.

Figure 1
Figure 1.

Different Eyes on the Same Storm

Measuring rain is harder than it sounds because rain is patchy, constantly changing, and can fall as drizzle, downpours, snow, or hail. Traditional tools each have strengths and weaknesses. Rain gauges measure water directly in one spot, but there are few of them, especially over oceans and in poorer regions. Weather radar paints detailed maps of rain over land, yet its coverage fades with distance and terrain. Satellites are the only way to monitor precipitation almost everywhere, but they do not sense raindrops directly. Instead, they detect light and microwaves affected by clouds and falling particles, and scientists must work backwards to infer how much rain reaches the ground.

How Satellites See Rain

Satellites use several types of sensors that each tell part of the story. Geostationary satellites, parked high above the equator, watch the same region continuously in visible and infrared light, tracking cloud tops but not the rain beneath. Lower-orbit satellites carry passive microwave instruments that sense faint emissions and scattering caused by raindrops and ice particles; these have a closer link to actual rainfall but see any one location only every few hours and at coarser resolution. A very small number of spaceborne radars can measure precipitation more directly, but they cannot cover the globe often. Because each sensor has gaps, modern rainfall maps combine many sources and, increasingly, rely on machine learning to squeeze more information out of the data.

Figure 2
Figure 2.

Building a Fair Testbed for Rain AI

Until now, researchers have trained AI models for satellite rainfall estimation on different regions, time periods, sensors, and resolutions, making it nearly impossible to tell whether one method truly beats another. The International Precipitation Working Group created SatRain to solve this. SatRain brings together multi-sensor satellite observations—visible, infrared, and microwave—along with high-quality "truth" data from gauge-corrected weather radar over the contiguous United States. All information is carefully aligned on common grids or along the native satellite scan paths, and the dataset is split into training, validation, and test sets following modern machine-learning practice. To test how well methods generalize beyond North America, SatRain also includes independent test data from Korea and Austria, based on local radar composites and dense rain-gauge networks.

Putting AI Methods Head-to-Head

Using SatRain, the authors trained several AI models to estimate how much rain is falling, and to detect where rain and heavy rain are occurring. They compared models that use only infrared cloud-top images, models that add many channels of visible and infrared data, and models that use microwave measurements. They also benchmarked different machine-learning techniques, from random forests and boosted trees to modern deep neural networks shaped like U-Nets. Across thousands of storm scenes, AI systems trained on SatRain were able to match or surpass leading operational products, including the widely used GPROF retrieval and ERA5 reanalysis, especially when using microwave inputs and advanced deep-learning architectures. The results held not only over the United States, but also across the independent test regions, despite some regional biases.

What This Means for Everyday Life

SatRain is not itself a new global rainfall product; instead, it is a common playing field where scientists and developers can prove that their algorithms really work and compare them fairly. By knitting together many satellite sensors with some of the best available ground-based measurements, SatRain makes it easier to design AI models that see through clouds, read subtle signals in spaceborne data, and better track where and how hard it is raining. In the long run, methods refined and tested on SatRain can be transferred into the next generation of global precipitation datasets, improving flood warnings, drought monitoring, and climate research that affects people everywhere.

Citation: Pfreundschuh, S., Arulraj, M., Behrangi, A. et al. A Benchmark Dataset for Satellite-Based Estimation and Detection of Rain. Sci Data 13, 244 (2026). https://doi.org/10.1038/s41597-026-06565-0

Keywords: satellite rainfall, precipitation dataset, machine learning, remote sensing, climate monitoring