Clear Sky Science · en

Harnessing hybrid stacking ensemble learning for accurate pulmonary embolism diagnosis using tabular clinical data

2026-05-13 · Back to index

Why this matters for patient care

Pulmonary embolism is a blood clot in the lungs that can kill within minutes if it is missed. Doctors rely heavily on complex scans and their own judgment to catch it in time. This study explores how smart computer systems can use routine clinical information, rather than images alone, to help flag patients who may have a hidden clot, supporting faster and more reliable decisions in busy hospitals.

Figure 1. Clinical data flow into an AI system that helps separate patients with and without lung blood clots.

The danger of hidden lung clots

Pulmonary embolism is one of the most common life threatening heart and lung emergencies, after heart attacks and strokes. Clots that travel to the lungs can suddenly block blood flow, strain the right side of the heart, and deprive the body of oxygen. Many patients die before anyone realizes what is happening. Yet when the condition is recognized quickly and treated, the chance of survival improves dramatically. This gap between silent risk and life saving action motivates the search for tools that can point doctors toward the right diagnosis sooner.

Limits of current tests and scores

Today, the main test for pulmonary embolism is a special type of CT scan of the chest. While powerful, these scans require expensive equipment, expert readers, and time. Standard clinical scoring systems and single machine learning models that use basic patient data have helped somewhat, but they often miss subtle patterns in large, mixed clinical datasets. As hospitals collect more digital records, there is a growing need for smarter systems that can learn from many kinds of clinical clues at once and still remain reliable and understandable to clinicians.

A team of models working together

The authors address this need using only the structured clinical information that comes with a large public CT dataset, without looking at the images themselves. They build a hybrid stacking ensemble, which is best thought of as a committee of different computer models that vote together on whether a patient has a clot. The committee includes two tree based models, a classic neural network, and a modern transformer model designed for table like data. Each model produces a probability that a clot is present, and a final simple model learns how to blend these opinions into one decision in a way that avoids overfitting and keeps the behavior stable.

Letting nature guide the tuning

To get the best out of this committee, the researchers use a nature inspired search method called the marine predators algorithm. This method explores many combinations of internal settings for each model and many ways of weighting their outputs, much like virtual hunters exploring a large ocean for the best fishing grounds. Using cross validation to guard against chance findings, the algorithm settles on a configuration that improves how well the full system separates patients with and without clots, compared with each individual model or with simpler voting schemes.

Figure 2. Different models feed into a combiner that separates lung clot and non clot cases into two clear outcome groups.

How well the system performs and what it learns

On the public RSNA pulmonary embolism dataset, the combined system reaches about 92 percent overall accuracy and a strong measure of discrimination between positive and negative cases. This outperforms all of the individual models and several standard ways of combining them. The authors then use explanation tools to see which clinical fields most influence the predictions. Features that directly describe clot presence and side, along with measures of strain on the right side of the heart, have the largest impact, while technical image quality flags have little effect. This pattern matches medical knowledge, suggesting that the model is focusing on clinically meaningful signals rather than noise.

What this means for future diagnosis

In plain terms, this work shows that a carefully tuned team of diverse computer models can use ordinary clinical data to help spot lung clots more accurately than single methods. While the system still needs testing beyond the dataset used here and does not replace scans or doctors, it offers a practical path toward support tools that highlight high risk patients earlier, reduce missed diagnoses, and make better use of existing hospital data in real world settings.

Citation: Abdelhamid, A., Moustafa, H.ED., Nafea, H.B. et al. Harnessing hybrid stacking ensemble learning for accurate pulmonary embolism diagnosis using tabular clinical data. Sci Rep 16, 15051 (2026). https://doi.org/10.1038/s41598-026-49331-3

Keywords: pulmonary embolism, clinical data, ensemble learning, machine learning, medical diagnosis