Clear Sky Science · en
Estimating seepage in heterogeneous earthfill dams on permeable foundations using explainable machine learning
Why water seeping through dams matters
Across the world, many communities depend on earthen dams to store drinking water, irrigate crops and tame floods. Yet these seemingly solid structures are constantly leaking. A small, well-controlled trickle is safe; too much hidden seepage can carve tunnels inside the dam or its foundation and trigger sudden failure. This paper explores how modern machine learning can help engineers predict that unseen flow more accurately and quickly, especially for the more realistic case of dams with layered interiors built on permeable ground.
How dams leak below the surface
Earthfill dams are not uniform piles of soil. Often, they have a relatively watertight central core flanked by more porous shells, all resting on a foundation that may itself let water pass. When reservoir water presses against the upstream face, some of it seeps through the dam body and under its base, following complex paths controlled by water depth, soil types and dam geometry. Traditional hand calculations simplify this picture so much that they only work for idealized shapes. Detailed computer simulations based on the finite element method do better, but they can be slow and require specialist expertise, which limits their use in rapid design checks or routine safety reviews.

Teaching computers to recognize seepage patterns
The authors assembled a large digital library of 4,374 seepage scenarios, each produced by a trusted simulation program that computes water flow through and beneath dams. For every case, they varied seven key inputs that engineers can measure or design: crest width, foundation depth, reservoir water level, the lengths of the upstream and downstream slopes, how easily water moves through the foundation, and how much less permeable the core is compared with the shells. For each input combination, the simulation produced one main output: how much water leaks per meter of dam. This curated dataset became the training ground for several machine learning models that aim to predict seepage directly from the seven inputs.
Finding the most reliable prediction engine
The team tested five families of models widely used on tabular engineering data: single decision trees, random forests and three “boosting” methods that combine many small trees into a powerful ensemble. Rather than guessing model settings, they used Bayesian optimization, a systematic search strategy that treats each trial as an experiment and homes in on the most promising options while avoiding overfitting. Performance was judged with a suite of error measures, visual plots comparing predictions with simulation results, and cross-checks using multiple data splits. The clear winner was the categorical gradient boosting (CGB) model, which reproduced the simulated seepage with near-perfect accuracy on unseen data, while simpler tree models lagged behind.
Opening the black box of machine learning
To make these sophisticated models useful for dam engineers, the authors needed to explain what drives their predictions. They turned to a modern interpretability tool known as SHAP, which assigns each input a share of responsibility for the output in every scenario. This analysis revealed that two factors dominate seepage behavior: how deep the reservoir water is and how much tighter the core is compared with the surrounding shells. Deeper water strongly pushes seepage upward, while a far less permeable core sharply reduces it. Foundation permeability and foundation depth play a secondary role, and crest width has a modest stabilizing effect. Slope details matter relatively little within the studied ranges, a result that can help focus attention on the most influential design levers.

From research model to everyday tool
To bridge the gap between theory and practice, the best-performing CGB model was wrapped in a simple desktop application. Users can enter the seven dam and foundation properties and instantly obtain a seepage estimate, either singly or for many scenarios at once. The model’s predictions matched both previous numerical studies and an independent case study of Pakistan’s Hub Dam, staying within about 10 percent of established results across several water levels. For non-specialists, the take-home message is reassuring: by combining high-fidelity simulations, advanced machine learning and clear explanations of what matters most, engineers now have a fast and transparent way to anticipate how much water will leak through complex earth dams and to check whether those leaks stay within safe limits.
Citation: Sayed Ahmed, M.M., Khursheed, M.Z., Alshameri, B. et al. Estimating seepage in heterogeneous earthfill dams on permeable foundations using explainable machine learning. Sci Rep 16, 12060 (2026). https://doi.org/10.1038/s41598-026-45048-5
Keywords: earthfill dams, seepage, machine learning, dam safety, geotechnical engineering