Clear Sky Science · en

physically interpretable residual strength prediction of corroded pipelines via symbolic Bayesian networks

· Back to index

Why pipeline safety matters to everyone

Modern life depends on vast networks of buried and underwater pipes that quietly move gas and oil over long distances. When these pipelines corrode, their metal walls thin and can eventually rupture, causing explosions, fires, and pollution. Engineers try to predict how much strength a damaged pipe still has so they can repair or replace it in time. This paper introduces a new way to make those predictions that is not only highly accurate, but also explains its reasoning in clear, physics‑like formulas that engineers can trust.

Figure 1
Figure 1.

The hidden dangers inside aging pipes

Pressurized steel pipelines are often called the lifelines of energy infrastructure, but they constantly battle harsh environments. Corrosion slowly eats away at the pipe wall, creating pits and grooves that weaken it. If the internal pressure climbs too high, a corroded section can burst. Traditional engineering formulas estimate the remaining strength of such pipes, but they are often conservative and do not generalize well to different pipe sizes, materials, or defect shapes. More sophisticated numerical simulations are accurate but time‑consuming and must be redone whenever conditions change. This creates a difficult trade‑off between speed, precision, and practicality in day‑to‑day safety assessments.

Black‑box AI is not enough for safety

Recent advances in machine learning have shown that computers can learn complex patterns linking pipe geometry, material properties, and defect size to the pressure at which a pipe will fail. Methods like neural networks and ensemble tree models already outperform simple formulas. However, they usually act as black boxes: they provide predictions without revealing the physical reasoning behind them. In safety‑critical applications, such as deciding whether to keep a pipeline segment in service, engineers and regulators need more than an answer—they need to understand why that answer makes sense. Post‑hoc explanation tools can offer hints, but they do not replace a clear, compact equation grounded in engineering intuition.

Figure 2
Figure 2.

A new blend of learning and human‑readable rules

The authors propose a framework called Symbolic Bayesian Networks (SyBN) that aims to combine the best of both worlds: high predictive accuracy and human‑readable insight. SyBN has two main branches that work in parallel. One branch is a Bayesian neural network that assigns probabilistic weights to each input feature—pipe diameter, wall thickness, strength of the steel, and the depth, length, and width of corrosion defects. This branch learns the complex, nonlinear relationships in the data and quantifies how uncertain its predictions are, especially in regions where there are few measurements. The second branch is a deep symbolic regression module that tries to express the same relationships as simple mathematical expressions made from basic operations like addition, subtraction, multiplication, and division. An adaptive “gate” between these branches decides, sample by sample, how strongly to force the symbolic part to match the neural network while still keeping the expressions compact and physically reasonable.

Putting the method to the test

To evaluate SyBN, the researchers used a benchmark dataset of 453 corroded pipeline cases collected from full‑scale burst experiments and carefully calibrated computer simulations. Each data point includes eight input parameters describing the pipe and its defects and the measured burst pressure. The data are challenging: pipe diameters span more than an order of magnitude, defect shapes vary widely, and the target burst pressures have large variability. When SyBN was compared against standard models—including linear and ridge regression, support vector regression, k‑nearest neighbors, random forests, gradient‑boosted trees, and XGBoost—it achieved the best performance on all common error measures. It also produced more stable results across repeated runs, thanks to its Bayesian treatment of feature importance and the regularizing effect of the symbolic branch.

Seeing which factors matter most

The team also examined how SyBN judges the importance of different inputs. The Bayesian neural network naturally learns which features it relies on most, and these weights were checked against SHAP, a widely used method for interpreting machine learning models. Both views agreed that pipe wall thickness is the dominant factor for burst pressure, followed by the stiffness of the steel and the length of the defect, while ultimate tensile strength and defect width play smaller roles. This alignment between two independent interpretation methods increases confidence that the model is capturing real physical effects rather than spurious patterns, and the symbolic expressions it produces give engineers direct formulas they can inspect, test, and even embed into design rules.

What this means for safer pipelines

In simple terms, this work shows that it is possible to build an AI system that predicts when a corroded pipeline might fail while also explaining its reasoning in equations an engineer can read. SyBN outperforms existing machine learning approaches on accuracy, provides realistic uncertainty bands around its predictions, and highlights which pipe features matter most. Although the current study focuses on static snapshots of corrosion rather than how damage grows over time, the framework points toward future monitoring systems that combine real‑time sensor data with transparent, trustworthy models. For the public, this translates into a better‑informed basis for maintenance decisions—and ultimately, fewer unexpected pipeline failures.

Citation: Chen, M., Zhang, Y., Ye, Y. et al. physically interpretable residual strength prediction of corroded pipelines via symbolic Bayesian networks. Sci Rep 16, 8151 (2026). https://doi.org/10.1038/s41598-026-41914-4

Keywords: pipeline corrosion, structural health monitoring, interpretable machine learning, symbolic regression, infrastructure safety