Clear Sky Science · en
Predicting lung cancer stage at diagnosis based on self-reported symptoms and background factors using machine learning models
Why catching lung cancer early is so hard
Lung cancer is one of the deadliest cancers largely because it is often found too late, when treatment options are limited. Yet many people with lung cancer do have symptoms before and at the time of diagnosis—such as cough, breathlessness, tiredness, or weight loss—that might, in theory, alert doctors sooner. This study asked a simple but important question: if patients systematically report their symptoms and background information in detail, can computers learn to spot who is likely to have lung cancer and whether it is at an early or late stage?

Listening to patients’ own stories
The researchers followed 486 people referred to a specialist clinic in Stockholm because their doctors suspected lung cancer. Everyone completed a detailed electronic questionnaire on a tablet called PEX-LC. It asked about 57 background factors (such as age, smoking, living situation, and past lung illnesses) and more than 100 possible symptoms, from breathing problems and cough to pain, fatigue, appetite changes, and fever. The questions captured not just the very first warning signs but also which symptoms were present around the time of diagnosis. Over the following year, medical records revealed who was diagnosed with lung cancer and whether it was at a non-advanced stage (mostly stages I–IIIa) or advanced stage (IIIb–IV).
Who turned out to have lung cancer
Of the people referred, about four in ten did not have cancer, while six in ten were diagnosed with lung cancer, split roughly evenly between non-advanced and advanced stages. Compared with those without cancer, patients with lung cancer tended to be older, more likely to smoke daily, more likely to live alone, and more likely to have lost weight during the previous year. Among those with advanced-stage disease, men were overrepresented, and prior lung conditions such as asthma, chronic obstructive pulmonary disease, and pneumonia were more common. These background patterns suggest that everyday factors—age, smoking history, living situation, and recent health changes—remain powerful signals of risk, even before focusing on specific symptoms.
Symptoms that stand out
When the team compared reported symptoms, they saw that people with early-stage lung cancer looked surprisingly similar to those without cancer: only a whistling breathing sound and the absence of fever clearly set them apart in simple one-by-one comparisons. In contrast, those with advanced-stage lung cancer had many more distinctive complaints. They were more likely to report shortness of breath, gasping for air, irritating cough, and noisy breathing, as well as pain (especially in the back), strong fatigue, weakness, chills, and problems with eating such as early fullness and loss of appetite. These patterns confirm that by the time lung cancer is advanced, it often disrupts multiple body systems, whereas early disease can hide behind vague or easily dismissed sensations.

What the computers could and could not do
To see whether complex combinations of answers might tell a clearer story than any single symptom, the researchers trained several types of machine learning models. These algorithms learned from 129 different questionnaire variables to separate people with non-advanced cancer from those without cancer, and separately to distinguish advanced cancer from no cancer. The models achieved only moderate accuracy: they did better than chance but were far from perfect, especially for early-stage disease. Background factors such as age, smoking status, sex, and living alone consistently ranked among the most influential predictors. Some symptoms—irritating cough, whistling or noisy breathing, gasping for air, tightness in the throat, pain, and appetite or weight changes—also contributed, particularly for advanced cancer. However, no small set of symptoms dominated; rather, dozens of subtle features needed to be combined to reach modest performance.
What this means for patients and doctors
The study shows that simply asking people in detail about their symptoms and life circumstances can reveal meaningful patterns linked to lung cancer, but that these signals are often faint, especially in the earlier stages when treatment has the best chance of cure. Machine learning models using only questionnaire data can help sort which referred patients might need especially urgent investigation, yet they are not accurate enough to stand alone as screening tools or diagnostic tests. For patients and clinicians, the main takeaway is that age, smoking, living alone, and recent weight loss, combined with persistent breathing problems, pain, appetite loss, or unexplained fatigue, should lower the threshold for thorough lung checks. The authors argue that the future of earlier lung cancer detection will likely come from blending such self-reported information with clinical data and biological tests, rather than relying on symptoms alone.
Citation: Gustavell, T., Sissala, N., Pernemalm, M. et al. Predicting lung cancer stage at diagnosis based on self-reported symptoms and background factors using machine learning models. Sci Rep 16, 11866 (2026). https://doi.org/10.1038/s41598-026-46710-8
Keywords: lung cancer, early detection, patient-reported symptoms, machine learning, risk assessment