Clear Sky Science · en

SuryaBench: Benchmark Dataset for Advancing Machine Learning in Heliophysics and Space Weather Prediction

· Back to index

Why the Sun’s data deluge matters

Our technology filled world quietly depends on the moods of the Sun. Solar storms can disturb GPS, radio links, and even electric power grids. NASA’s Solar Dynamics Observatory has spent more than a decade watching the Sun in exquisite detail, but the sheer volume and complexity of these images make them hard to use widely. This paper introduces SuryaBench, a carefully prepared collection of solar data designed so modern artificial intelligence tools can learn to read the Sun and help us forecast space weather more reliably.

A clearer picture of our active star

SuryaBench starts from the raw stream of pictures and measurements captured by NASA’s Solar Dynamics Observatory since 2010. One instrument records light from very hot solar gas high above the surface, while another maps magnetic fields and motions on the surface itself. The authors align, clean, and standardize these images so that the Sun’s disk always appears in the same place, with the same size and sharpness, and with changes in the spacecraft’s orbit or camera aging corrected. The result is a unified view of the Sun, at full original resolution, repeated every 12 minutes across almost an entire 11 year solar cycle.

Figure 1. How detailed views of the Sun feed AI systems that help protect satellites, power grids, and communications on Earth.
Figure 1. How detailed views of the Sun feed AI systems that help protect satellites, power grids, and communications on Earth.

From messy measurements to machine ready data

Turning this raw treasure trove into material that computers can learn from requires many careful steps. The team corrects the rotation and pointing of the spacecraft, adjusts for different exposure times so bright and faint features are treated fairly, and compensates for gradual wear in the detectors. They also bring together images from separate instruments so that structures seen in hot glowing gas line up exactly with the magnetic patterns that shape them. In addition, they fix the apparent size of the solar disk, which would otherwise change as the spacecraft’s distance to the Sun varies. These choices remove distractions that would confuse learning algorithms and let them focus on real solar behavior.

Built in challenges for smart models

Rather than just offering cleaned images, SuryaBench also includes six ready made “challenge sets” that capture key questions in solar and space weather research. One focuses on identifying magnetically active patches on the solar surface, another on spotting subtle signs that such a region is about to emerge, and a third on connecting surface magnetism to the looping magnetic field high in the corona. Others link the solar images to actual events that matter near Earth, such as bursts of X rays called solar flares, changes in the solar wind gusting past Earth, and variations in the Sun’s extreme ultraviolet output that affect the upper atmosphere and satellite drag. Each task comes with standard data splits and example machine learning models, so researchers can compare methods on equal footing.

Figure 2. How cleaned and aligned solar images flow into AI tasks that forecast flares, solar wind, and magnetic activity step by step.
Figure 2. How cleaned and aligned solar images flow into AI tasks that forecast flares, solar wind, and magnetic activity step by step.

Testing today’s tools on tomorrow’s storms

To show how SuryaBench can be used, the authors run a variety of popular deep learning models on several of these tasks. They demonstrate that a modern neural network can already learn to predict future solar images with good fidelity and can tackle problems such as segmenting active regions, estimating solar wind speed, and classifying strong flares. Performance varies by task and model family, highlighting both the promise of data driven approaches and the need for further innovation. By reporting common accuracy measures and sharing the code, the project sets a firm reference line for future work.

What this means for life on Earth

In practical terms, SuryaBench is less about building a single perfect forecast today and more about giving the community a shared, high quality playground to build better ones tomorrow. By packaging years of carefully cleaned solar observations together with well defined prediction problems, the dataset lowers the barrier for both solar physicists and machine learning experts to collaborate. As models improve, we can expect steadier guidance on when the Sun is likely to disturb our space based and ground based systems. For the general public, this effort brings us closer to treating space weather a bit more like ordinary weather, something we can track, anticipate, and plan for with growing confidence.

Citation: Roy, S., Hegde, D.V., Schmude, J. et al. SuryaBench: Benchmark Dataset for Advancing Machine Learning in Heliophysics and Space Weather Prediction. Sci Data 13, 712 (2026). https://doi.org/10.1038/s41597-026-06552-5

Keywords: space weather, solar flares, heliophysics, machine learning, solar wind