Clear Sky Science · en
Interpretable machine learning based decision tree model for predicting obstructive airway disease in a large non-smoking health screening population
Why hidden lung problems matter
Many people think serious lung diseases mainly threaten long-time smokers. Yet a surprising number of non-smokers quietly develop breathing problems that go unnoticed until they become severe. This study asked a practical question: can we use routine health checkup data—like age, blood pressure, and common blood tests—to flag non-smoking adults whose lungs may already be struggling, long before they feel short of breath? The researchers also wanted the predictions to be easy for doctors to understand, not a mysterious black box.
Looking for warning signs in routine checkups
The team analyzed records from a massive health screening program in Taiwan that followed more than half a million adults. From this large group, they focused on 81,055 people who had never smoked and who had complete data from their physical exams, lab tests, and lung function tests. Lung function was measured using a standard breathing test that compares how much air a person can blow out in one second to their total breath. When this ratio drops below a certain cut-off, it signals obstructed airways, a hallmark of conditions such as asthma and chronic obstructive pulmonary disease (COPD).

Teaching computers to spot at-risk lungs
Instead of relying on a single computer method, the researchers combined six well-known machine learning approaches that are often used in medical prediction work. These methods included decision trees and several related techniques that build large collections of trees to boost accuracy. Each method was trained to distinguish between people with normal breathing tests and those showing airway obstruction, using 25 common pieces of information such as age, height, weight, blood pressure, education level, and routine blood measurements. To keep results reliable, the team repeatedly split the data into training and testing sets, balanced rare positive cases with more common negative ones, and checked how well each model performed.
Finding the most telling features
All six computer models did a reasonably good job, reaching similar scores when judged by how well they separated people with and without obstructed airways. But the real goal was to identify which health exam features mattered most, and then turn that knowledge into simple rules doctors could follow. To do this, the researchers ranked the importance of each feature in every model, then averaged these rankings. Age consistently rose to the top across methods. Measures related to body build—such as height and weight—also proved important, as did blood pressure and several routine lab tests. One of these, lactate dehydrogenase (LDH), is a broad marker of tissue stress in the body and appeared to carry useful information about lung health even when other blood tests were considered.
From complex models to simple decision rules
After identifying the strongest predictors, the team built a single, easy-to-read decision tree that used only the top 30 percent of features. This simpler model performed almost as well as models that used all 25 variables, but with a structure that clinicians can visually inspect. The tree starts with age at the top, then branches based on factors like height, LDH levels, body weight, and education level. Following each branch leads to “leaf” groups that have higher or lower chances of airway obstruction. For example, older adults above a certain age, or younger but shorter adults with particular lab patterns, formed groups where obstructive airway problems were more common. The authors stress that some of these markers, especially LDH, are not specific to the lungs and likely reflect overall health rather than direct lung damage.

What this means for everyday health checks
The study shows that it is possible to turn routine health exam data into an interpretable set of rules that highlight non-smokers who may need closer lung evaluation, such as full breathing tests or specialist referral. The model is not meant to replace lung function testing or deliver a firm diagnosis, but to act like a smart triage assistant that helps doctors notice at-risk individuals who might otherwise be overlooked. Because the approach is based on common measurements and emphasizes clear, step-by-step decision paths, it could be adapted to real-world screening settings. Future work will need to confirm these findings over time and in more diverse populations, but this research offers a promising example of how transparent artificial intelligence can support earlier detection of silent lung problems.
Citation: Chang, CY., Shen, HS., Kuo, YL. et al. Interpretable machine learning based decision tree model for predicting obstructive airway disease in a large non-smoking health screening population. Sci Rep 16, 12807 (2026). https://doi.org/10.1038/s41598-026-43633-2
Keywords: obstructive airway disease, non-smoker lung health, interpretable machine learning, decision tree screening, health checkup data