Clear Sky Science · en

Federated spatial-temporal traffic forecasting with VMD-enhanced graph attention and LSTM

· Back to index

Why predicting city traffic really matters

Anyone who has been stuck in a traffic jam knows how unpredictable city movement can feel. Yet behind the scenes, planners, transit operators, and navigation apps all depend on computers that try to forecast how many bikes, taxis, or cars will be on each street in the next few minutes or hours. This paper explores a new way to make those predictions more accurate while keeping sensitive travel data private, using a blend of clever signal cleaning, network modeling, and shared learning across cities.

Figure 1
Figure 1.

The challenge of messy and private traffic data

City traffic patterns are wildly uneven. Morning rush hour, sudden storms, accidents, road works, and big events all push demand up and down in ways that change from place to place and from day to day. Traditional forecasting tools assume behavior is fairly regular over time, which traffic clearly is not. At the same time, many different organizations now collect detailed mobility data—from bike-share systems to taxi fleets—yet are often unable or unwilling to pool raw data because of privacy rules, commercial competition, and security concerns. A forecasting method that can cope with this messiness, learn from many partners, and still keep raw data local is therefore highly desirable.

Breaking complex signals into clearer pieces

The first ingredient of the proposed system is a signal-cleaning step called variational mode decomposition, which can be understood as a smart filter that breaks a noisy traffic curve into several simpler waves plus a leftover remainder. One wave might capture slow daily cycles, another weekly rhythms, and others the fast, jittery bursts of demand. By letting the model look at each of these strands separately, the method reduces interference between long-term trends and short-lived spikes, making the patterns easier to recognize. This decomposition happens independently on each partner’s machine, so the original travel records never leave their home organizations.

Teaching the model to follow patterns in time and space

Once the traffic signal has been split into cleaner pieces, it is fed into a deep learning backbone designed to track both how demand unfolds over time and how it spreads across the city map. A long short-term memory module acts as a kind of selective memory, deciding which past movements are worth remembering and which should fade away. A multi-head attention layer then focuses the model on the most informative moments in the recent past, such as sharp rises before rush hour or sudden drops after a storm ends. In parallel, a graph-based component treats each station or zone as a point in a network and learns how changes in one area ripple through to others, without relying on a fixed road map. Together, these pieces form a flexible engine that can capture shifting relationships in both time and space.

Sharing knowledge across cities without sharing trips

The second major idea is to let many different data owners train a shared forecasting model without ever sending their raw records to a central server. Instead, each client—say, a bike-share system in one district or a taxi fleet in another—trains the model locally and only sends updated model settings to a central coordinator. The server blends these updates into a new global model and sends it back. A client-side validation step then checks, module by module, whether the global changes actually help on that client’s own data. If not, the client keeps its local version for that part of the model. This selective adoption means each participant gains from the crowd’s experience while still tailoring the system to its own unique patterns.

Figure 2
Figure 2.

What the experiments show in the real world

To see how well this approach works, the authors tested it on two large, real datasets: bike-sharing trips in New York City and taxi rides in Chicago, both aggregated by hour and location. They compared their system with a wide range of existing deep learning and graph-based models, in both traditional centralized training and privacy-preserving federated setups. Across the board, the VMD-enhanced, federated model reduced average prediction errors substantially—by roughly a quarter to two-fifths compared with a strong baseline—while also converging reliably even when different clients had very different traffic patterns. The results suggest that cleaning the signals into multiple frequency bands and letting each client carefully decide which shared updates to accept are both crucial to achieving stable accuracy.

Takeaway: smarter, more private traffic forecasts

In everyday terms, this work shows that traffic forecasts can become both sharper and more respectful of privacy by combining three ideas: splitting demand curves into simple waves, modeling how movement spreads through a city network over time, and letting many data owners cooperate without exposing their raw logs. The proposed framework consistently outperforms earlier methods in accuracy and robustness, hinting at a future where city agencies, mobility operators, and even connected vehicles can jointly train powerful forecasting tools while keeping sensitive trip details close to home.

Citation: Mundada, T., Ramdhave, S., Jain, S. et al. Federated spatial-temporal traffic forecasting with VMD-enhanced graph attention and LSTM. Sci Rep 16, 8852 (2026). https://doi.org/10.1038/s41598-026-37917-w

Keywords: traffic forecasting, federated learning, urban mobility, graph neural networks, time series