Clear Sky Science · en

Water quality index prediction via a robust machine learning model using oxygen-related indices for river water quality monitoring

· Back to index

Why River Oxygen Matters to Everyone

Clean rivers are not just scenic backdrops; they are drinking water sources, irrigation lifelines, and habitats for fish and wildlife. Yet many rivers around the world are slowly suffocating as pollution strips oxygen from the water. This study introduces a new, smarter way to keep watch over river health by using a handful of oxygen-related measurements and machine learning to predict an easy-to-understand water quality score. The goal is to give communities and decision-makers a fast, reliable tool to spot trouble before rivers reach a crisis point.

Figure 1
Figure 1.

A Simple Score for a Complex River

Water scientists often compress dozens of chemical and biological measurements into a single Water Quality Index, or WQI. This score lets non-specialists see at a glance whether water is excellent, good, moderate, or poor. However, many versions of WQI either treat oxygen only indirectly or do not fully exploit how central oxygen is to aquatic life. Oxygen tells us whether fish can breathe, whether microbes are breaking down waste, and whether a river can recover after a pollution event. The authors argue that a smarter index should lean heavily on oxygen-related information, which is widely measured and directly tied to the survival of river ecosystems.

Watching Three Very Different Rivers

To test this idea, the researchers focused on three contrasting rivers in Iran. One flows through a hot, semi-arid basin with large temperature swings; another runs cold and fast from a mountainous region by the Caspian Sea; the third drains into the environmentally stressed Lake Urmia. Together they cover clear, well-oxygenated stretches as well as murkier, stressed reaches affected by farming, cities, and industry. At dozens of stations along these rivers, teams measured basic field properties such as temperature, dissolved oxygen, acidity, and electrical conductivity, and collected samples to analyze in the laboratory for organic pollution, suspended sediment, nutrients, and bacteria.

Teaching a "Super Model" to Read the Water

From this rich data set, the authors built what they call a "Super Model" using a machine learning technique known as Support Vector Regression. Instead of feeding the algorithm every available chemical, they concentrated on a small set of oxygen-related indicators: dissolved oxygen, biological oxygen demand, chemical oxygen demand, and water temperature. These measures capture how much oxygen is in the water, how quickly it is being consumed by organic and chemical pollution, and how temperature speeds up or slows down these processes. The model was trained to predict a new oxygen-based water quality index, WQIOIs, which mirrors traditional WQI scores but is driven mainly by these core oxygen signals.

Checking Accuracy, Clarity, and Trust

The team then asked three key questions: How accurate is the model, how general is it, and can we understand its decisions? First, they showed that the model predicts WQIOIs extremely well, with more than 95% of the variation explained and very small average errors. Second, when tested on rivers it had never “seen” during training, the model still closely matched a more complex, conventional index that uses many extra measurements. This suggests that a few carefully chosen oxygen indicators can stand in for a full laboratory workup. Third, the authors used an interpretability method called SHAP to peek inside the model’s logic. The analysis confirmed that high dissolved oxygen strongly raises the quality score, while high temperature and heavy organic pollution push it down, mirroring well-established ecological understanding rather than hidden quirks in the data.

Figure 2
Figure 2.

From Numbers to Real-Time Warnings

Beyond technical tests, the study explores how this tool could work in practice. By clustering river conditions into categories such as "Cold and Healthy" or "Hot and Oxygen Depleted," managers can see when a river is entering a risky state, for example during summer low flows when warm water holds less oxygen. The model also ranks samples so that a small number of readings can flag most of the truly impacted sites, which is vital when budgets and staff are limited. Because the required measurements are cheap and widely available, the same framework could be plugged into simple dashboards or early-warning systems in many regions, including those with limited laboratory capacity.

What This Means for Rivers and People

In everyday terms, the study shows that we can judge a river’s health very accurately by watching how it breathes. A compact set of oxygen-related tests, interpreted through a carefully trained machine learning model, can match the performance of far more complicated and expensive monitoring schemes. That means faster, more affordable tracking of pollution, better timing for inspections and clean-up efforts, and clearer communication to the public about when a river is safe for fish, farming, or recreation. As similar models spread and are adapted to other regions, they could become the backbone of real-time, data-driven river protection around the world.

Citation: Arzhangi, A., Partani, S. Water quality index prediction via a robust machine learning model using oxygen-related indices for river water quality monitoring. Sci Rep 16, 6102 (2026). https://doi.org/10.1038/s41598-026-36156-3

Keywords: river water quality, dissolved oxygen, water quality index, machine learning, environmental monitoring