Clear Sky Science · en
Explainable machine learning for long-term cardiovascular disease risk prediction in Chinese middle-aged and older adults: a 9-year longitudinal cohort study with web-based risk calculator
Why heart risk prediction matters
Cardiovascular disease is now the top cause of death in China, especially among people in mid to later life. Yet most tools that doctors use to estimate someone’s future heart and stroke risk were built on Western populations and do not fit Chinese adults very well. This study asked whether modern artificial intelligence methods could offer more accurate, yet still understandable, long term risk estimates tailored to Chinese adults aged 45 and older.
A nationwide look at ageing hearts
The researchers drew on the China Health and Retirement Longitudinal Study, a large ongoing survey that follows tens of thousands of community dwelling adults across most provinces. From this resource they selected 8,080 people aged at least 45 years who had no cardiovascular disease when they entered the study in 2011. These participants were then followed for nine years to track new cases of heart disease and stroke. The team began with 77 pieces of information that are easy to collect in clinics, including age, region, past illnesses, mood symptoms, sleep habits, body measurements and blood test results. Using standard statistical checks they narrowed this down to 11 key factors that were both practical to measure and strongly linked with later cardiovascular events.

Teaching computers to spot patterns
Next, the investigators tested ten different computer based prediction methods, ranging from traditional logistic regression to more flexible approaches such as random forests, gradient boosting and neural networks. They split the participants into a training group, used to build each model, and a separate validation group, used to test how well the models worked in new people. Performance was judged on how accurately each method separated those who went on to develop cardiovascular disease from those who stayed free of it, how well predicted risks matched actual event rates, and how useful the predictions would be in real life decisions about who should receive extra prevention efforts.
Which everyday factors mattered most
The random forest method came out ahead, achieving strong accuracy and the best balance between catching high risk individuals and avoiding false alarms. To open up this method’s inner workings, the team used an explanation technique called SHAP, which assigns each risk factor a contribution to the final prediction. This analysis showed that waist size was the single most influential factor: each extra centimetre around the waist raised nine year risk noticeably, highlighting the importance of abdominal fat in this population. High triglyceride levels, older age and a history of high blood pressure were also major drivers of risk, while higher levels of protective HDL cholesterol were linked with lower risk. Interestingly, mood and sleep patterns carried independent information: higher depression scores and too little or too much night time sleep both nudged risk upward even after accounting for traditional medical factors.

From research model to everyday tool
Because all 11 predictors are routinely available in primary care, the team translated the best performing model into a simple web based calculator. Users enter age, region, selected medical history items, waist circumference, two common blood fat measurements, depression score and usual sleep duration. The tool then returns an estimate of that person’s chance of developing cardiovascular disease over the next nine years. The authors stress that this calculator is designed to support, not replace, professional judgment and should be used alongside full clinical evaluation, local resources and patient preferences.
What this means for patients and doctors
The study shows that carefully designed and explained machine learning can give more accurate long term heart and stroke risk estimates for middle aged and older Chinese adults than traditional formulas. It also underlines that what happens at the waistline, in blood fats and in daily sleep and mood all matter for future cardiovascular health. By packaging these insights into a free online calculator, the work offers community clinics and individuals a low cost way to identify higher risk people earlier and guide tailored prevention strategies, while leaving final decisions in the hands of clinicians.
Citation: Zhu, XY., Li, W., Pan, XY. et al. Explainable machine learning for long-term cardiovascular disease risk prediction in Chinese middle-aged and older adults: a 9-year longitudinal cohort study with web-based risk calculator. Sci Rep 16, 14998 (2026). https://doi.org/10.1038/s41598-026-45297-4
Keywords: cardiovascular disease, machine learning, risk prediction, waist circumference, Chinese adults