Clear Sky Science · en

DVS-PedX: Synthetic-and-Real Event-Based Pedestrian Dataset

2026-03-06 · Back to index

Why Faster Eyes on the Road Matter

When you approach a crosswalk as a driver, a fraction of a second can decide whether you stop in time. Today’s driver‑assistance systems usually rely on ordinary video cameras that capture full images many times per second. But a newer kind of “event camera” works more like a human retina, reacting only to changes in brightness at each pixel. This paper introduces DVS‑PedX, a large dataset built to help researchers teach such cameras—and brain‑inspired algorithms—to notice when people are about to cross the street, even in rain, fog, or at night.

From Regular Video to a New Kind of Vision

Traditional cameras take complete snapshots at fixed intervals, whether or not anything is moving. Event cameras, by contrast, report tiny flashes of information whenever a point in the scene gets brighter or darker. Each flash carries its location, time, and whether brightness went up or down, with timing measured in microseconds. This makes them naturally good at picking up motion and edges while ignoring largely static backgrounds. For tasks like spotting pedestrians and anticipating their intent, this “only what changes” view can be more efficient, faster, and more robust under glare, shadows, or headlights than conventional video.

Building a Virtual City of Crosswalks

To give scientists controlled data to work with, the authors first used the CARLA driving simulator to create hundreds of virtual street scenes. A self‑driving car approaches a crosswalk while a digital pedestrian may or may not step onto the road. Lighting (day, dusk, night) and weather (clear, rain, fog) are shuffled from run to run, as are pedestrian appearances and the exact timing of any crossing. Two virtual sensors, a regular color camera and a simulated event camera, look out from the driver’s point of view. The system records standard video at 30 frames per second and, in parallel, compacts the event stream into “event frames” every 33 milliseconds so it lines up with each video frame. Every frame is labeled simply as “crossing” or “not crossing,” making it straightforward to train and test pedestrian‑aware systems.

Turning Real Dashcams into Event Streams

Virtual scenes alone are not enough: real streets are messier. To capture this, the team built a second component from a widely used dashcam collection called JAAD, which features short clips of urban driving with carefully annotated pedestrian behaviors. They ran all 346 clips through a conversion tool that simulates how an event camera would respond to each frame. This tool models brightness changes at every pixel and even interpolates in‑between frames to approximate continuous motion. The result is a “synthetic event view” of real roads, with sharp motion edges where people and cars move and much of the static background falling away. The authors checked these converted streams against data from physical event cameras used in manufacturing, showing that the synthetic events match real ones in overall activity, structure, and timing.

What the Dataset Contains and How It Performs

DVS‑PedX combines 198 simulated sequences from CARLA and 346 converted real‑world clips from JAAD. Each sequence offers matched pairs of color images and event frames, raw event files for fine‑grained timing analysis, and frame‑level crossing labels. The crossings themselves are relatively rare, mirroring real traffic, which makes the learning problem realistic and challenging. To show that the dataset is useful but not trivial, the authors trained spiking neural networks—algorithms that process information in discrete pulses, similar to biological neurons. These models performed strongly on the synthetic sequences but dropped in accuracy when tested directly on the converted real data, then improved again when a bit of real data was mixed into training. This “simulation‑to‑reality gap” confirms that the dataset can drive research in domain adaptation and multimodal fusion.

Safer Streets Through Smarter Sensing

In plain terms, DVS‑PedX is a carefully assembled library of moments when people might or might not cross the street, seen through both ordinary and event‑based “eyes.” By spanning clean simulations and gritty real dashcam footage, and by including clear labels and open‑source tools, it gives researchers a common testbed for exploring how to detect pedestrians and anticipate their intent under difficult conditions. The hope is that, by learning from this dataset, future driver‑assistance and robotic systems will react faster and more reliably—bringing us a step closer to safer, more attentive machines on our roads.

Citation: Sakhai, M., Sithu, K., Oke, M.K.S. et al. DVS-PedX: Synthetic-and-Real Event-Based Pedestrian Dataset. Sci Data 13, 614 (2026). https://doi.org/10.1038/s41597-026-06969-y

Keywords: event cameras, pedestrian safety, autonomous driving, neuromorphic vision, traffic datasets