Clear Sky Science · en

Clustering and machine learning techniques identify air pollution regimes in Greater Cairo

· Back to index

Why the City’s Air Matters to Everyone

Greater Cairo, home to more than 20 million people, often ranks among the world’s smoggiest megacities. Dust blowing in from the desert mixes with exhaust from millions of cars and emissions from factories, creating a complex haze that is hard to understand and even harder to manage. This study shows how modern data tools can disentangle that haze into a few clear “modes” of pollution, giving city planners and health officials a practical way to know when the air is usually safe, when traffic is the main culprit, and when dangerous dust storms are rolling in.

Figure 1
Figure 1.

Seeing Patterns in a Cloud of Data

The researchers focused on the period 2023–2024 and used information from the Copernicus Atmosphere Monitoring Service, which blends satellite, ground, and model data into a detailed picture of the atmosphere. Instead of treating each pollution reading as an isolated point, they looked at several ingredients at once: tiny and coarse particles (PM₂.₅ and PM₁₀), gases such as nitrogen dioxide from vehicles, and basic weather conditions like temperature, wind speed, and pressure. Their aim was not just to forecast tomorrow’s numbers, but to uncover recurring “regimes” of air quality that tend to appear again and again over the city.

Sorting Days into Four Types of Air

To reveal those regimes, the team used a clustering method that groups similar days together based on their combined pollution and weather fingerprints. After testing different options, they found that four groups captured the structure of the data without becoming overly complex. Two of these groups turned out to be encouraging: low and very low pollution conditions made up about three-quarters of the two-year period, showing that Cairo does enjoy relatively clean air much of the time. A third group reflected days when traffic emissions dominated, marked by high levels of nitrogen dioxide from vehicles. The fourth and smallest group, only about 6% of the time, corresponded to dust storms, when coarse particles surged to levels well above health guidelines.

Figure 2
Figure 2.

Teaching Machines to Recognize Each Regime

Finding patterns is useful only if they can be detected quickly in real time. To test this, the authors trained two types of decision-making models to recognize which regime a new set of readings belongs to. A single decision tree, which makes a series of if–then choices, correctly identified the regime more than 93% of the time. A more powerful method called a random forest, which combines many such trees, pushed this success rate above 97%. By examining which inputs mattered most, the models also revealed what drives each regime: nitrogen dioxide was especially important for spotting traffic-heavy days, while coarse dust particles PM₁₀ were key for flagging dust storm events.

From Computer Rules to Real-World Action

Beyond raw accuracy, the framework proved stable over time and fast to run, meaning it can be used alongside existing air-quality services as an early warning tool. Because the approach focuses on relative patterns rather than exact concentration values, it remains useful even if the underlying satellite-based data have some bias in an absolute sense, which is known to happen over desert regions. In practice, this means authorities could quickly tell whether the city is entering a routine low-pollution phase, a traffic-dominated episode that calls for traffic management, or a short-lived but intense dust storm that warrants public health alerts.

What This Means for People in Cairo

For residents, the main message is that Cairo’s air does not sit at one constant level of danger: it shifts among a small set of recognizable states. Most of the time the air is relatively clean, but traffic and dust can push conditions into riskier territory, especially for people with heart or lung disease. By turning massive streams of environmental data into four easy-to-understand regimes, this study offers a roadmap for smarter alerts, better planning, and more targeted efforts to cut pollution at its sources. The same recipe could also be applied to other fast-growing cities wrestling with a mix of urban smog and natural dust.

Citation: Elmourssi, D.M., El-Assy, A.M. & Amer, H.M. Clustering and machine learning techniques identify air pollution regimes in Greater Cairo. Sci Rep 16, 14038 (2026). https://doi.org/10.1038/s41598-026-49777-5

Keywords: air pollution, Greater Cairo, machine learning, dust storms, traffic emissions