Clear Sky Science · en
Proactive soft-failure prediction in optical transport networks via physics-inspired features and Infrastructure-as-Code orchestration
Why hidden cracks in the internet matter
Most of the world’s internet traffic travels through hair-thin glass fibers that quietly span continents and oceans. When these optical highways falter, even for a moment, banks, hospitals, and emergency services feel the shock. Today, many of these networks only react after problems become serious enough to disrupt service. This study explores a way to spot subtle warning signs in the signals themselves so operators can step in before online connections blink off.

From waiting for trouble to staying ahead of it
Current practice in optical transport networks is largely reactive. Devices watch a key quality measure of the light signal and only raise an alarm once it drops below a fixed limit. By the time that happens, traffic is already at risk, and operators rush to move data to healthier paths. The authors propose a proactive approach: estimate how long remains before a link becomes unusable and trigger a smooth traffic shift while there is still a safe time cushion. The target is gradual problems such as aging amplifiers and growing distortions in the fiber, not sudden cuts or power losses that no early warning can predict.
Teaching machines to read signal health
To forecast failure, the team feeds a learning algorithm with short histories of a standard signal metric and several simple statistics built from it. Instead of relying only on the current quality level, they also include how fast it is changing, how that rate itself is changing, and how noisy or stable the recent past has been. These added features are “physics-inspired” because they mirror how engineers think about wear, drift, and instability in real equipment, while keeping the learning task itself purely data driven. A popular tree-based method called a Random Forest turns these patterns into a prediction of time remaining before the signal crosses a critical threshold.
Testing the approach in both models and real traffic
The authors validate their method in two very different settings. First, they build a controlled simulation that mimics several types of gradual degradation, from smooth exponential decline to more erratic, oscillating behavior. Here, the model predicts the remaining safe time with an average error of under 20 seconds. Second, they test on a large public dataset that imitates behavior on hundreds of real optical paths with different kinds of failures and healthy links. In this more challenging environment the typical error is about 73 seconds, still good enough to act ahead of trouble and roughly six times better than simple rule-based methods that many operators use today.
Explaining decisions and wiring them into the network
Because network operators must trust automated alarms, the authors add a tool that explains which input factors drove each warning. In several case studies, the explanations highlight exactly what an engineer would expect: the current signal quality and its recent trend dominate the call, while short-term fluctuations help distinguish real decline from harmless noise. The prediction system is then tied into a modern “infrastructure-as-code” control loop. When projected time-to-failure drops below a chosen safety margin and stays there for a few readings, the system writes a new desired network layout into a version-controlled configuration. Cloud-style software tools detect this change and carry out a make-before-break move of traffic to a healthier path, all in about seven seconds of processing time.

What this means for everyday connectivity
For non-specialists, the message is simple: it is becoming possible to treat parts of the internet more like a car that warns you weeks before a breakdown than one that simply dies on the highway. By combining basic physics insight, transparent machine learning, and automated control software, this work shows that gradual, signal-based failures in optical networks can often be anticipated with enough lead time to move traffic without users noticing. Sudden breaks and certain hidden fault types still require other safeguards, but proactive prediction can reduce costly outages and make the digital services people rely on every day more quietly dependable.
Citation: Ali, O.M., Radwan, A.M.A., Radwan, O.M.A. et al. Proactive soft-failure prediction in optical transport networks via physics-inspired features and Infrastructure-as-Code orchestration. Sci Rep 16, 16139 (2026). https://doi.org/10.1038/s41598-026-52186-3
Keywords: optical networks, failure prediction, machine learning, network reliability, infrastructure as code