Clear Sky Science · en

Integrated pixel-wise remote sensing and explainable machine learning for natural hydrogen exploration in southeastern part of Pricaspian Basin, Western Kazakhstan

· Back to index

Why hidden hydrogen matters

Hydrogen is often hailed as a clean fuel of the future, but most of it today is manufactured at considerable energy and financial cost. A quieter story is unfolding beneath our feet: the Earth itself may be naturally producing vast amounts of hydrogen gas. This paper explores how scientists can spot subtle surface clues from space and use transparent, explainable artificial intelligence to narrow down where natural hydrogen might be hiding in the rocks of western Kazakhstan.

Figure 1
Figure 1.

A new kind of energy treasure hunt

Unlike oil and gas, natural hydrogen is lightweight, seeps easily, and leaves only faint fingerprints at the surface. Classical tools like seismic surveys or gravity measurements struggle to detect it directly. Yet in several regions around the world, hydrogen has been linked to specific rock types, deep faults and strange round surface depressions nicknamed “fairy circles.” Western Kazakhstan’s Precaspian Basin already hosts giant oil fields, thick salt layers that make good seals, and rock types known to generate hydrogen. That geological recipe suggests the area could also store pockets of naturally produced hydrogen, if we can learn how to find them efficiently and cheaply.

Seeing invisible gas from space

The researchers turned to the European Sentinel-2 satellites, which record sunlight reflected from Earth in several colors, including wavelengths sensitive to vegetation, soil moisture and surface minerals. Each satellite image is made of tiny squares, or pixels, 10 meters across—about the size of a small house lot. For every pixel in the Atyrau region of western Kazakhstan, the team computed a set of numerical features: raw color bands, simple indices that track plant health and surface water, and texture measures that capture how rough or uniform the ground appears. These ingredients formed a 22-variable description of each pixel’s surface conditions, without any drilling or local sampling.

Teaching machines to recognize subtle patterns

To connect those surface signatures to possible hydrogen seepage, the team used four well-known machine learning methods that excel at classification tasks. They trained these models on millions of pixels labeled by experts as likely hydrogen-related or not, based on prior field surveys and geological insight. Instead of simply declaring a yes-or-no answer for each pixel, the models produced a probability that hydrogen was present. A strict cutoff was then applied so that only very confident pixels were flagged. To boost reliability, the researchers kept only locations where at least three of the four models agreed, and then grouped neighboring pixels into clusters that could represent real hydrogen-prone zones rather than isolated noise specks.

Figure 2
Figure 2.

Opening the “black box” of artificial intelligence

One of the main concerns with machine learning in the geosciences is trust: if a model says “this spot looks promising,” experts want to know why. The study therefore built explainability directly into the workflow. Using a technique called SHAP, the authors measured how much each spectral feature pushed a pixel toward or away from a hydrogen prediction. Across the different models, similar patterns emerged. Bands in the near-infrared and short-wave infrared—sensitive to vegetation stress, dry mineral crusts and salt-rich surfaces—were consistently the most influential. When these feature importance maps were laid over geological cross-sections and known faults, many high-scoring regions coincided with plausible migration pathways and surface anomalies, lending physical credibility to the machine’s choices.

From broad screening to boots on the ground

The resulting maps are not a direct proof of hidden hydrogen reservoirs, but they provide a powerful screening tool. The models tend to be generous in flagging potential sites, catching most of the pixels that look hydrogen-like while also producing many false alarms. For early-stage exploration, this trade-off is acceptable: the goal is to shrink a vast frontier region down to a manageable set of targets for field campaigns. In Atyrau, the approach highlights a handful of coherent clusters, some aligned with deep faults and salt “windows,” where gas could plausibly rise from the subsurface. By combining satellite data, pixel-wise machine learning and clear explanations of what drives each prediction, the study offers an interpretable and low-cost roadmap for scouting natural hydrogen in Kazakhstan and other underexplored basins worldwide.

Citation: Wayo, D.D.K., Goliatt, L., Hazlett, R. et al. Integrated pixel-wise remote sensing and explainable machine learning for natural hydrogen exploration in southeastern part of Pricaspian Basin, Western Kazakhstan. Sci Rep 16, 11085 (2026). https://doi.org/10.1038/s41598-026-41845-0

Keywords: natural hydrogen, remote sensing, machine learning, Kazakhstan, energy exploration