Clear Sky Science · en
A reproducible benchmark of QRS detection algorithms across diverse ECG datasets and noise conditions
Why tracking each heartbeat matters
Every beat of your heart leaves a tiny electrical signature on an electrocardiogram, or ECG. Pinpointing the exact peak of each beat is essential for calculating heart rate and subtle variations between beats that reveal stress, sleep quality, and heart disease risk. As ECG sensors move from hospital monitors to wristbands and chest straps, researchers need to know which computer methods can still find those peaks reliably when real life adds motion, noise, and messy data.

The challenge of finding clean peaks in messy signals
The study focuses on detecting a specific point in the ECG signal called the R-peak, the sharp spike that marks each heartbeat. These peaks are the reference points for heart rate and for heart rate variability, a measure used in cardiology, neurology, and stress research. In ideal conditions the peaks are easy to see, but in real recordings the signal is distorted by body motion, loose electrodes, electrical interference, and natural differences between people, especially those with irregular rhythms. Even a single missed or wrongly detected peak can throw off later analyses, so the question is not just how well a method works on clean data, but how reliably it works across many people and recording situations.
Building a common test bed for heartbeat detectors
To address this, the authors assembled a reproducible benchmark of 17 R-peak detection methods. These span classic signal processing techniques that apply filters and mathematical rules, as well as machine learning and deep learning models that learn patterns from data. All methods were evaluated in the same way on five open ECG databases from the PhysioNet platform, covering long term monitoring, resting recordings, motion during walking and running, irregular heart rhythms, and recordings with artificial noise mixed in. For learning based methods, the researchers trained each model only on a separate public dataset and then froze its settings, so the tests reflect how well the models generalize to new patients and conditions they have never seen.
Who wins: hand-tuned rules or learned models
Across more than a million heartbeats, some clear trends emerged. Classic signal processing methods, especially one called the Blocks of Interest approach, delivered the most consistent performance when all databases were pooled. A recurrent neural network that looks at sequences of beats excelled in the noisiest recordings, keeping its accuracy higher than most rivals when the signal was heavily contaminated. Deep learning models could perform extremely well on some datasets, particularly under strong noise, but their results tended to drop more when the new data looked different from the training material. Older reference methods that assume a very regular heartbeat struggled with recordings from patients with arrhythmias, where the rhythm is irregular by definition.

What noise and movement do to the numbers
By comparing conditions, the authors showed how different sources of disturbance affect performance. All algorithms worked very well on relaxed, resting recordings and on motion data from sitting subjects. As soon as participants started walking or running, detection quality dipped slightly but consistently for nearly every method, reflecting the impact of movement on wearable sensors. In the extreme case of the dedicated noise stress database, overall scores fell for all approaches, but the recurrent neural network remained relatively stable, hinting that using context across multiple beats helps it see through the clutter. These patterns suggest that no single detector is best everywhere and that combining methods or switching strategies based on estimated noise levels could be beneficial.
What this means for doctors, devices, and researchers
For clinicians and developers of wearable devices, the key message is practical: if you need an algorithm that works well out of the box on many kinds of ECGs, tried and tested signal processing approaches are still a safe choice, while deep learning methods may require carefully chosen and diverse training data to avoid surprises in new settings. The authors also provide their full code, data links, and evaluation scripts as an open framework, so future teams can plug in new algorithms and test them under the same conditions. Rather than crowning a single winner, the work maps out the strengths and weaknesses of leading methods and encourages the community to build more robust, shareable tools for reading the rhythms of the heart.
Citation: Wolf, S.M., Rahlmeier, T., Lustfeld, S. et al. A reproducible benchmark of QRS detection algorithms across diverse ECG datasets and noise conditions. Sci Rep 16, 15748 (2026). https://doi.org/10.1038/s41598-026-53724-9
Keywords: ECG, R-peak detection, heart rate variability, signal processing, deep learning