Clear Sky Science · en
Factor of safety prediction for high road embankments using mixed effects random forest and bee colony optimization
Why the stability of road embankments matters
When you drive along a highway built on a raised earth mound, you are trusting that this man‑made hill will not suddenly give way. The safety of these high road embankments is judged using a number called the “factor of safety,” which compares the forces keeping the soil in place to the forces trying to make it slide. Traditionally, engineers have relied on hand calculations or heavy computer simulations to estimate this factor. This study shows how modern machine learning can make those predictions faster and more reliable, potentially reducing the risk of catastrophic slope failures that threaten people, property, and transport networks.
Building thousands of virtual embankments
To train and test their models, the researchers first created a large, realistic dataset using advanced numerical simulations instead of relying only on a few real‑world case studies. They modeled road embankments between 6 and 30 meters high with many different slope shapes, including stepped designs that use horizontal benches called berms to improve stability. They varied key soil properties—such as how heavy the soil is, how much water it contains, how stiff it is, how much it resists sliding, and how cohesive it is—along with the strength of the foundation soil beneath the embankment. For each of 1,176 scenarios, a finite element program calculated the factor of safety and searched for the most likely slip surface, providing a trusted “ground truth” against which machine learning predictions could be judged.

From classic models to smarter forests
The team then compared three kinds of data‑driven models. The first was the well‑known Random Forest method, which combines many decision trees to make robust predictions. The second, called Mixed Effects Random Forest, extends this idea by explicitly accounting for grouped or “clustered” data—exactly the situation in geotechnical work, where sets of measurements may come from the same site, soil type, or construction phase. Finally, they introduced a new hybrid approach: Artificial Bee Colony‑optimized Mixed Effects Random Forest (ABC‑MERF). Here, a swarm‑inspired optimization algorithm, modeled on how bees search for food, automatically tunes the many settings of the mixed‑effects forest to squeeze out better performance without trial‑and‑error guessing by the engineer.
Cleaning the data and testing the predictions
Before training the models, the researchers carefully prepared the data. They identified extreme outliers using a standard box‑plot method and capped them at reasonable limits so that rare odd values would not distort the learning process. All inputs were then scaled between 0 and 1, which suits the bee‑based optimizer and keeps different variables comparable. The data were split into training and testing sets, and a strict evaluation protocol used several error measures, including how closely predictions matched simulated safety factors and how much of the variation in the data the models could explain. Additional checks, such as residual plots and statistical tests, were used to confirm that the models were not just memorizing the training data but were genuinely learning the underlying patterns.

What the models learned about soil and slopes
All three approaches performed impressively, but the ABC‑MERF model came out on top. It explained over 99 percent of the variation in the factor of safety and kept typical prediction errors to around two percent of the safety range. Just as important, the model’s behavior made physical sense. Analyses of feature importance and response curves showed that the internal friction angle of the embankment soil and the height of the embankment were the most influential factors, followed by slope steepness, cohesion, and the use of berms. Higher friction angles and greater cohesion increased stability, while taller embankments and steeper slopes reduced it—exactly what basic soil mechanics predicts. This agreement between data‑driven results and engineering theory is crucial if practitioners are to trust machine learning tools in safety‑critical design.
From research tool to engineering assistant
The study concludes that a carefully designed hybrid of mixed‑effects random forests and bee‑inspired optimization can provide highly accurate, physically meaningful predictions of the factor of safety for high road embankments. For a lay reader, the key message is that engineers can now combine detailed virtual testing with advanced machine learning to quickly screen many design options and highlight risky configurations before they are built. While such models do not replace expert judgment or site‑specific investigations—especially under earthquakes or heavy rain—they offer a powerful decision‑support tool to help keep the embankments beneath our roads stable and safe over their long service lives.
Citation: Boufarh, R., Boursas, F., Bakri, M. et al. Factor of safety prediction for high road embankments using mixed effects random forest and bee colony optimization. Sci Rep 16, 6003 (2026). https://doi.org/10.1038/s41598-026-35431-7
Keywords: slope stability, road embankments, factor of safety, machine learning, geotechnical engineering