Clear Sky Science · en

Improving end-bearing capacity prediction of rock-socketed shafts using Gaussian-augmented optimized extreme gradient boosting models

· Back to index

Building on Rock Instead of Guesswork

When engineers design bridges and high‑rise buildings, they often rely on deep foundations that extend down into solid rock. The strength of these “rock‑socketed shafts” is crucial for safety and cost, yet their true capacity at the base is hard to measure directly. This study shows how modern machine‑learning tools, combined with smart data‑generation tricks, can give engineers much sharper estimates of how much load these deep foundations can safely carry—potentially saving money on construction while keeping structures secure.

Why Deep Foundations Are So Hard to Judge

Rock‑socketed shafts are large concrete columns drilled through weaker soil and anchored into stronger rock. In theory, the harder the rock and the better the construction, the more weight a shaft can support at its tip. In practice, things are messy: mud and slurry can accumulate at the bottom of the hole, roughness and shape of the socket vary, and hidden voids or cracks in the rock are difficult to see. Because of these uncertainties, designers often play it safe by assuming little or no support from the shaft tip, which leads to longer, more expensive foundations than may be necessary.

From Simple Formulas to Smarter Predictions

Past methods for estimating shaft capacity have relied on simplified equations or traditional computer models. These usually focus on a handful of properties—such as the compressive strength of the rock—and treat the rock mass in an idealized way. Over the last few years, researchers have started using artificial‑intelligence techniques to learn directly from databases of load tests, where shafts have been pushed until their behavior is well understood. These approaches can juggle many inputs at once, including shaft diameter, depths in soil and rock, and measures of rock quality, but they are also “black boxes” that can overfit when data are limited.

Figure 1
Figure 1.

Feeding the Algorithm with Real and Synthetic Data

The authors built on a published set of 151 rock‑socketed shaft tests that recorded the end‑bearing factor (a measure of how much load the tip can carry) along with eight descriptive features. After carefully cleaning the data to remove outliers and gaps, they kept 136 real shafts. To overcome the small sample size—a common issue in geotechnical engineering—they then created additional “synthetic” data by adding gentle, random Gaussian noise to the existing records. This produced a larger, statistically consistent set of 460 shafts that preserved the original patterns while offering more variety for training machine‑learning models.

Training and Tuning the Learning Machines

The team focused on an algorithm called Extreme Gradient Boosting, or XGBoost, which combines many simple decision trees into a powerful predictor. To squeeze the best performance from XGBoost, they coupled it with three nature‑inspired optimization schemes based on arithmetic rules, brainstorming behavior, and whale hunting strategies. These optimizers automatically tuned key settings—such as tree depth and learning rate—to find a balance between fitting the known data and avoiding overfitting. Among the variants, the XGBoost model tuned by the Arithmetic Optimization Algorithm (XGBoost_AOA) emerged as the most accurate and stable.

What the Models Learned About Rock and Shafts

Using only the original 136 shafts, the optimized model already outperformed earlier methods. When trained on the expanded 460‑shaft set, its accuracy improved dramatically: prediction errors shrank to a fraction of their former size, and the match between predicted and observed capacities came very close to an ideal one‑to‑one line. The analysis also revealed which inputs mattered most. Rock compressive strength and a rock‑mass rating were the dominant predictors, while shaft diameter and overall load level also played strong roles. Measures that are closely related to each other, such as two different rock‑quality scores, were found to be highly redundant, highlighting how overlapping information can encourage overfitting if not handled carefully.

Figure 2
Figure 2.

From Research Code to Practical Tool

To make the results usable outside the lab, the authors wrapped their best‑performing model into an easy‑to‑use computer interface. Engineers can enter basic shaft and rock parameters and receive an immediate estimate of tip capacity, along with evidence that the model has been checked against independent case histories. While the approach still depends on the quality and range of the underlying data, it demonstrates how combining machine learning, synthetic data generation, and interpretability tools can turn scattered test results into a practical design aid—helping to reduce guesswork, trim unnecessary conservatism, and design safer, more economical foundations.

Citation: Khatti, J., Fissha, Y. & Cheepurupalli, N. Improving end-bearing capacity prediction of rock-socketed shafts using Gaussian-augmented optimized extreme gradient boosting models. Sci Rep 16, 7664 (2026). https://doi.org/10.1038/s41598-026-38646-w

Keywords: rock-socketed shafts, deep foundations, machine learning, data augmentation, geotechnical engineering