Clear Sky Science · en

MIMIC-III-Ext-PPG, a PPG-based Benchmark Dataset for Cardiovascular and Respiratory Signal Analysis

· Back to index

Why wrist sensors can tell a life-or-death story

Many of us wear smartwatches that quietly track our pulse day and night. In intensive care units, a very similar light-based signal called photoplethysmography, or PPG, is recorded around the clock from critically ill patients. This paper introduces MIMIC-III-Ext-PPG, the largest and most detailed public collection of these pulse signals to date, designed to help researchers build and test new algorithms for spotting dangerous heart rhythms, estimating blood pressure without a cuff, and tracking breathing.

Figure 1
Figure 1.

A giant library of pulse snapshots

The authors assembled more than 6.3 million short, 30-second snippets of PPG signals from 6,189 intensive care patients whose data are part of the well-known MIMIC-III hospital database. Each snippet captures how light passing through a fingertip changes with every heartbeat, a simple measurement that today is available on everything from bedside monitors to consumer wearables. For many of these snippets, the dataset also includes synchronized electrocardiogram, blood pressure, and breathing signals, turning each pulse snapshot into a rich, multi-signal window onto the heart and lungs.

From bedside notes to detailed heart rhythm labels

What makes this dataset stand out is not just its size, but its labels. In the original hospital system, nurses and doctors regularly recorded the patient’s heart rhythm on electronic charts. The team carefully matched these chart entries to the exact times covered by the waveform recordings, then harmonized different recording systems into a single, consistent set of 26 heart rhythm types. These range from normal rhythm and simple speeding up or slowing down, through various atrial and ventricular arrhythmias, to pacemaker-driven rhythms and complete conduction blocks. This level of detail goes well beyond earlier pulse-based datasets, which usually offered only one or two rhythm categories.

Measuring more than just the heartbeat

To support a range of studies, the authors extracted a host of basic vital signs directly from the signals. From the blood pressure waveforms they computed typical top and bottom pressures; from the breathing signal they estimated breathing rate; and from the electrocardiogram they derived heart rate. These values were calculated in short time windows, using established open-source algorithms and best-practice rules to avoid spurious readings. By packaging these measurements with every 30-second segment, the dataset allows researchers to test algorithms that predict blood pressure, heart rate, or breathing rate from the pulse signal alone, and to explore how these targets change together.

Making sure the signals are trustworthy

Real-world hospital data can be messy: sensors fall off, patients move, and cables disconnect. To avoid misleading analyses, the team built a signal quality pipeline that screens each segment. For every signal type, they checked for flat lines, missing values, implausible heart or breathing rates, and inconsistent beat shapes. Segments that passed all checks were marked as high quality; those with minor issues but still usable information were tagged as low quality; and segments with serious problems were excluded entirely. The authors also validated one key label, atrial fibrillation, by comparing it against expert-reviewed electrocardiogram annotations from another study, finding high agreement and nearly perfect specificity.

Figure 2
Figure 2.

A foundation for future health algorithms

By combining huge scale, detailed heart rhythm labels, multiple synchronized signals, and explicit quality scores, MIMIC-III-Ext-PPG offers a powerful testbed for data-driven medicine. Researchers can use it to benchmark new methods for detecting irregular heartbeats from wrist-like sensors, estimating blood pressure without a cuff, or building multi-task models that learn several vital signs at once. Although it is not meant to guide real-time medical decisions on its own, this open dataset lays the groundwork for more reliable and generalizable algorithms that may one day turn everyday pulse sensors into early warning systems for serious heart and lung problems.

Citation: Moulaeifard, M., Kutscher, M., Aston, P.J. et al. MIMIC-III-Ext-PPG, a PPG-based Benchmark Dataset for Cardiovascular and Respiratory Signal Analysis. Sci Data 13, 668 (2026). https://doi.org/10.1038/s41597-026-07335-8

Keywords: photoplethysmography, arrhythmia detection, intensive care data, blood pressure estimation, wearable health sensors