Clear Sky Science · en
Dataset for multi-perspective traffic video analysis
Why many eyes on the street matter
Anyone who has tried to cross a busy street knows that cars, bikes, and people move in ways that can be hard to predict. Cities are turning to cameras and artificial intelligence to keep these spaces safe, but most systems still look at the road from just one angle. This paper introduces a new public video dataset that watches the same crosswalk from three different viewpoints at once, giving researchers the rich raw material they need to build safer, smarter traffic systems.

One crosswalk, three ways of seeing
The dataset focuses on a single, everyday scene: a campus crosswalk at the University of Murcia in Spain. Instead of relying on one camera, the authors recorded each event with three devices working at the same time: a camera mounted on a moving car approaching the crosswalk, a camera fixed on a roadside pole at an elevated position, and a camera on a small drone hovering above. Together, these views capture the same people and vehicles from ground level, from the side, and from the sky, closely mirroring how different observers might see the same moment in real life.
Capturing both routine walks and rare mishaps
To make the data useful for both everyday monitoring and emergency situations, the team staged two types of events. In some recordings, pedestrians simply cross while the car stops and waits, reflecting normal traffic behavior. In others, an actor simulates a fall while in the crosswalk, following motion patterns designed to resemble a real accident. The car always follows the same route, and the pedestrians repeat the same basic movements, so researchers can compare how each scenario looks from the different cameras and study how unusual events stand out from routine ones.
From raw footage to powerful research fuel
A key feature of the dataset is that the video files are kept raw and unedited. The only processing is the addition of precise time stamps, plus a simple visual cue: at the start of each crossing, one pedestrian briefly raises a hand. This makes it easy to line up frames from all three cameras so that the same instant in time can be studied from each angle. The 18 video files cover three camera setups and two crossing conditions (with and without a fall) across three different spatial arrangements of the car, roadside unit, and drone. Researchers also receive extra images that describe the exact optical properties of the roadside camera lens, helping them correct for distortion when needed.

Testing how well machines understand the scene
To check that the dataset is truly useful, the authors ran standard object-detection tests, comparing their recordings with well-known traffic video collections such as KITTI, VisDrone, and UA-DETRAC. They used modern detection models to locate people in the videos and measured how accurately the predicted shapes matched human-verified outlines. On average, the new dataset produced higher scores for both the precision of detections and the alignment of bounding shapes. By examining how often each person was visible in one, two, or all three views, the team also showed that overlapping coverage from the different cameras greatly reduces blind spots when people are hidden behind cars or street furniture.
What this means for future streets
For non-specialists, the core message is that this dataset offers a much more complete picture of what happens at a crosswalk than earlier collections. By combining car, roadside, and aerial views in a synchronized way, it gives engineers and scientists a realistic testbed for building traffic systems that can track people more reliably, spot accidents quickly, and cope with real-world complications like obstacles and changing viewpoints. In the long run, resources like this can help power safer crossings, more responsive traffic lights, and smarter city services that better protect everyone who uses the road.
Citation: Sanchez-Iborra, R., Kouvakis, V., Trevlakis, S.E. et al. Dataset for multi-perspective traffic video analysis. Sci Data 13, 543 (2026). https://doi.org/10.1038/s41597-026-06907-y
Keywords: traffic surveillance, multi-view video, pedestrian safety, smart cities, computer vision dataset