Clear Sky Science · en

Air quality index prediction using a hybrid CEEMDAN-CNN-IGWO-BiGRU-Attention model

· Back to index

Why clearer air forecasts matter

City residents often hear that tomorrow’s air will be “good” or “unhealthy,” but these warnings can be vague or late. This study tackles a simple practical question: can we predict day-to-day air quality more accurately so people, doctors, and city officials can plan ahead? The authors focus on Guangzhou, a major city in southern China, and build a new computer model that turns messy pollution records into reliable next-day Air Quality Index (AQI) forecasts, aiming to support real-world early warning systems.

Figure 1. Turning noisy city air pollution records into clear next day air quality forecasts with a smart layered model.
Figure 1. Turning noisy city air pollution records into clear next day air quality forecasts with a smart layered model.

Making sense of messy city air

Air quality is shaped by many changing forces, from traffic and factories to weather, seasons, and sudden events like dust storms. As a result, AQI readings jump around in complicated, noisy ways that defeat many older forecasting tools. Traditional physics-based air models need huge computing power and detailed emissions inventories, while simple statistical methods struggle with the ups and downs of real urban smog. Even many modern machine learning systems still find it hard to tease out key patterns from such tangled data and often require tedious trial and error to adjust their internal settings.

Breaking the problem into cleaner pieces

The researchers’ first trick is to break the daily AQI record into several smoother layers, each capturing changes at a different time scale. They use a signal-processing method that adds small amounts of artificial noise to separate fast wiggles, medium-term cycles, and slow background trends without mixing them together. High-frequency layers contain quick spikes and random fluctuations, while middle layers hold most of the meaningful day-to-day and multi-day swings, and the final layer traces the long-term trend. By turning one messy curve into several more regular subseries, the overall prediction challenge becomes easier and more targeted.

Teaching the model to read time

Each of these layers is then fed into a specialized neural network that combines two strengths. A set of one-dimensional convolution blocks scans for short local patterns, such as repeated daily cycles, using filters of different lengths. Their outputs go into a bidirectional recurrent network that looks both forward and backward along the time axis, capturing how pollution builds up and clears over several days. An attention module then highlights the most informative days in each window, allowing the model to focus on what matters most when forming a forecast. Finally, predictions from all layers are added back together to recover the expected overall AQI.

Letting digital “wolves” tune the knobs

Modern neural networks have many design choices, such as how many filters or units to use, how quickly to learn, and how much random dropout to apply. Picking these by hand is slow and often suboptimal. To avoid this, the authors use a population-based search inspired by the hunting behavior of grey wolves. Virtual “wolves” roam the space of possible settings, guided by how well each candidate network forecasts AQI on a validation set. An improved strategy for exploring and refining these candidates helps the swarm escape local dead ends and home in on combinations that keep prediction errors low and learning stable.

Figure 2. Stepwise process that splits air data into time scales, analyzes each with neural networks, then recombines for accurate forecasts.
Figure 2. Stepwise process that splits air data into time scales, analyzes each with neural networks, then recombines for accurate forecasts.

How well does the approach work

Tested on eleven years of daily AQI data from Guangzhou, the new framework clearly beats a wide range of rivals, including classical methods, standard recurrent networks, and other hybrid deep models. It achieves a high coefficient of determination (R² of about 0.96) and a mean squared error roughly one third that of a strong baseline recurrent network, and still maintains reasonable accuracy when asked to look three or seven days ahead. Careful “ablation” tests, where pieces of the system are removed one by one, show that every component—the signal decomposition, the convolution blocks, the bidirectional memory, the attention layer, and the wolf-based tuning—contributes meaningfully to the final performance.

What this means for everyday life

To a non-specialist, the bottom line is that the authors have built a smarter way to read the hidden rhythms in urban smog records and turn them into reliable next-day AQI forecasts. The model handles both quick pollution swings and longer seasonal shifts better than existing tools, and can transfer reasonably well to other cities without retraining from scratch. While it still struggles with rare extreme events and requires significant computing effort during development, once trained it can generate forecasts in about half a second. In practical terms, this kind of system could help cities issue earlier, more precise warnings, giving residents more time to adjust outdoor plans and protect their health.

Citation: Fang, Y., Liu, S. & Su, Z. Air quality index prediction using a hybrid CEEMDAN-CNN-IGWO-BiGRU-Attention model. Sci Rep 16, 15908 (2026). https://doi.org/10.1038/s41598-026-46978-w

Keywords: air quality index, air pollution forecasting, deep learning, time series, urban environment