Clear Sky Science · en

An explainable AI-driven hybrid feature selection approach for coronary artery disease diagnosis

2026-03-25 · Back to index

Why this matters for your heart

Coronary artery disease is the condition behind many heart attacks, yet it often hides in plain sight until serious damage is done. Doctors have plenty of tests, but many are expensive, invasive, or hard to access, especially in low- and middle-income countries. This paper explores how a new kind of explainable artificial intelligence can sift through routine medical information to spot who is at risk, using fewer measurements while still giving doctors insight into which signs truly matter.

The problem of too much information

Modern medicine can measure dozens of traits for every heart patient: age, blood pressure, lab values, symptoms, and findings from scans and heart tracings. But not all of these clues are equally helpful. Using too many weak or redundant measurements can actually confuse computer models, slow them down, and make their predictions less reliable. Earlier studies tried many ways of trimming this list, but no single method consistently worked best, and most acted like black boxes, offering little explanation of why a given feature was kept or discarded.

A smarter way to pick the right clues

The authors propose a two-step method called SHOW (SHAP Optimized Wrapper) to tackle this issue. First, they use an explainable AI technique known as SHAP to estimate how much each medical feature contributes to predicting coronary artery disease. They do this separately for three strong machine learning models that approach the problem in different ways. Then they blend these three views into one stable ranking of features, so they are not relying on the quirks of a single model. This gives an ordered list from the most informative clinical clues to the least useful ones.

Building lean and accurate prediction models

In the second step, SHOW walks down this ranked list and gradually builds a set of features for each classifier. It starts with the top feature, trains a model, and then adds the next one in line. If adding a new feature improves accuracy, it stays; if not, it is thrown away. This continues until no further gains are seen. Along the way, the data are carefully prepared: missing entries are removed, rare disease cases are balanced using a standard oversampling trick, and numerical values are scaled so that no single measurement dominates just because of its raw range.

Putting the method to the test

To see whether SHOW really helps, the team tested it on three well-known coronary artery disease datasets that differ in size, complexity, and how many patients actually have the disease. They tried seven popular machine learning models, from simple logistic regression to more advanced techniques such as random forests and XGBoost. For each dataset, they compared performance using all available features versus only those chosen by SHOW, repeating the tests many times in a cross-checking scheme to avoid lucky flukes. They also tracked not only overall correctness but also how well the models avoided missing sick patients and how clearly they separated healthy from diseased cases.

What they found in real patient data

Across all three datasets, SHOW consistently allowed the XGBoost model to match or beat the best reported results in the literature while using far fewer inputs. For example, in a dataset with 55 clinical features, SHOW cut the list down to 14 yet achieved about 94% accuracy and similarly high sensitivity, meaning most patients with disease were correctly flagged. In two other datasets with 13 features each, the method selected only 5 features while keeping accuracy around 86–88%. In practical terms, this suggests that a focused handful of measurements—such as specific types of chest pain, key lab results, and particular imaging signs—can carry most of the diagnostic weight when chosen wisely.

Looking ahead to simpler, clearer heart checks

The study shows that explainable AI can do more than just make predictions; it can help clarify which everyday clinical signs truly matter for diagnosing coronary artery disease. By pinpointing a small, high-value set of measurements, SHOW could support cheaper and faster screening tools that are still highly reliable and more transparent to clinicians. While the approach is computationally heavy and will need to be streamlined for very large datasets, it offers a promising path toward smarter, more understandable AI assistants that help doctors catch heart disease earlier without drowning in data.

Citation: Elemam, T., Refaat, H. & Makhlouf, M. An explainable AI-driven hybrid feature selection approach for coronary artery disease diagnosis. Sci Rep 16, 10411 (2026). https://doi.org/10.1038/s41598-026-41712-y

Keywords: coronary artery disease, explainable AI, feature selection, medical diagnostics, machine learning