Clear Sky Science · en

Empowering genetic discoveries and cardiovascular risk assessment by predicting electrocardiograms from genotype

· Back to index

Reading the Heart from Our Genes

Most people will never have a heart rhythm test early in life, yet nearly everyone now has their DNA stored in large research projects. This study asks a bold question: can we use those genetic records to predict what a person’s heart tracing would look like—and from that, estimate their future risk of heart disease? If so, doctors could one day warn people about cardiovascular trouble years before symptoms appear, using only a blood or saliva sample.

Figure 1
Figure 1.

Why Heart Signals Matter

Cardiovascular diseases are the top cause of death worldwide. A simple, painless test called an electrocardiogram (ECG) records the heart’s electrical activity and can reveal dangerous rhythm problems or damaged heart muscle. Many subtle features of an ECG, such as the heights and widths of its waves, are partly inherited. Large studies suggest that 40–70% of the differences between people’s ECGs can be traced to genetics. Unfortunately, in big biobanks like the UK Biobank, only about one in ten participants has both DNA data and ECG recordings. This makes it hard to uncover all the genetic factors behind heart disease or to use ECG information for early risk prediction at scale.

Teaching a Neural Network to Imagine ECGs

The researchers developed a deep learning model called CapECG that learns to translate a person’s genetic variants into 169 detailed ECG measurements. They trained it on more than 37,000 people of European ancestry who had both DNA and 12-lead ECG data in the UK Biobank. Because the genome contains millions of closely related markers, they first grouped nearby variants into blocks that tend to be inherited together and used a method called LD-PCA to compress each block into a few key components. CapECG then applies an “attention” mechanism to weigh which blocks matter most, and a capsule-style neural network to capture complex, layered patterns between genetic changes and ECG traits.

How Well the Model Reads the Genetic Heartprint

On an internal test set of 7422 people, CapECG’s predicted ECG traits matched the real measurements with an average correlation of about 0.62 for the 102 traits that are clearly heritable. Some features were predicted particularly well, with correlations above 0.8. One focus of the study was the spatial QRS-T angle, a three-dimensional measure of how the heart’s electrical activation and recovery line up in space. This angle has been tied to dangerous rhythm disturbances and sudden cardiac death. CapECG predicted this angle with a correlation around 0.65, and statistical checks showed that the predicted and observed values agreed closely, especially for traits with stronger genetic influence.

Figure 2
Figure 2.

Uncovering Hidden Genetic Clues and Forecasting Disease

Once trained, CapECG was applied to nearly 390,000 UK Biobank participants who had DNA but no ECG recordings, effectively “imputing” their ECGs from their genes alone. The team then ran large-scale genetic association studies on these predicted ECG traits. For the spatial QRS-T angle, they discovered 133 significant genetic sites, including 33 that overlapped with a major previous study of more than 118,000 people—far more overlap than using the smaller set of real ECGs alone. Similar gains appeared for the QT interval, a key measure linked to dangerous arrhythmias. Gene-level analysis highlighted dozens of genes involved in heart electrical signaling and rhythm control, and pointed to additional candidates not previously tied to heart function.

From Predicted Tracings to Future Heart Trouble

The researchers then built another deep learning model, DeepCVD, that uses the 169 CapECG-predicted ECG traits, plus age and sex, to estimate a person’s risk of six major cardiovascular conditions, including high blood pressure, heart attack, and atrial fibrillation. Trained on hundreds of thousands of genetically profiled participants, DeepCVD reached an average accuracy (AUC) of about 0.80 in a held-out test group—substantially better than a standard polygenic risk score approach that relied only on DNA and basic factors and reached about 0.71. A companion model, DeepCVD-Age, used the same inputs to forecast the age at which someone might be diagnosed with these conditions; its predictions correlated strongly (around 0.74) with the actual ages recorded in the database, and performed reasonably well even in people of non-European ancestry.

What This Could Mean for Patients

In plain terms, this work shows that a machine-learning system can learn enough from combined DNA and ECG data to “imagine” an ECG for people who never had the test. Those imagined ECG traits are good enough not only to discover new genes involved in heart rhythm and structure, but also to outperform widely used genetic scores at predicting who will develop heart disease and roughly when. While the approach still needs testing in independent populations and further refinement, it points toward a future in which a simple genetic test could provide a window into a person’s lifelong heart health, long before the first abnormal tracing appears on a clinic screen.

Citation: Lin, S., Yang, Y. & Zhao, H. Empowering genetic discoveries and cardiovascular risk assessment by predicting electrocardiograms from genotype. npj Digit. Med. 9, 255 (2026). https://doi.org/10.1038/s41746-026-02438-3

Keywords: cardiovascular risk prediction, electrocardiogram genetics, deep learning in medicine, genome-wide association studies, biobank data