Clear Sky Science · en
A Bayesian network for identifying causes of breathlessness using a national electronic medical records (EMR) database
Why finding the cause of breathlessness matters
Struggling to catch your breath can be frightening, whether it appears suddenly or creeps up over months. Breathlessness is often the first sign that something is wrong with the heart or lungs, yet doctors in everyday general practice may face a long list of possible causes and only limited time and tests. This study describes a new computer-based tool that uses patterns in millions of anonymous medical records to help general practitioners (GPs) quickly home in on the most likely reasons a patient is short of breath, with the aim of speeding up diagnosis and avoiding unnecessary tests.

A common symptom with many possible roots
Breathlessness, sometimes called shortness of breath or dyspnea, is a very common complaint with serious consequences. People who feel breathless more often have worse quality of life, more anxiety and depression, and a higher risk of hospitalisation and early death. It is especially linked to long-term lung diseases such as asthma and chronic obstructive pulmonary disease (COPD), and to heart conditions like heart failure, but can also be due to infections, blood clots, or even cancer. Because so many illnesses share this single symptom, GPs often have to order multiple tests and refer patients to different specialists, which can delay the right treatment and drive up health-care costs.
Turning routine records into a learning tool
The researchers tapped into a large UK database of electronic medical records from 50 general practices, covering about 136,000 adults who saw a GP for breathlessness between 2002 and 2024. From these records they identified nearly 385,000 distinct "episodes" of breathlessness and linked them, where possible, to ten key diagnoses known to cause shortness of breath, including asthma, COPD, heart failure, lung cancer, pneumonia and blood clots in the lung. To do this fairly, they defined time windows around each episode: for a fast-moving problem like pneumonia, they looked only a couple of weeks before and after the visit, whereas for slower illnesses like lung cancer they looked many months either side. They also pulled out 34 simple pieces of information about each patient—such as age, sex, smoking, symptoms like cough or wheeze, current medicines, and past diagnoses.
How the smart network works
Using this information, the team built a type of statistical model called a Bayesian network. This can be pictured as a web of connected dots, where each dot represents something about the patient (for example, "current smoker" or "history of COPD") or one of the ten possible causes of breathlessness. The lines between dots show how strongly they are related. When a GP enters a patient’s details, the network updates the chances of each diagnosis, based on patterns learned from all previous patients in the database. The structure of the network was first learned from the data and then refined with input from lung and heart specialists to ensure it made clinical sense and did not rely on impossible cause–effect relationships.

How well the tool performs
To test the model, the researchers set aside 30% of breathlessness episodes that were not used during development. On this separate group, the tool’s ability to distinguish between patients with and without each condition ranged from moderate to excellent. For example, its performance score (known as ROC-AUC) was 0.94 for heart failure and 0.90 for asthma, meaning it very rarely confused patients who did and did not have these conditions. Even for more challenging diagnoses such as non-pneumonia chest infections, performance was better than chance. Additional checks showed that the probabilities the model produced closely matched what was actually seen in the data. Not surprisingly, someone’s previous history of a disease was often the strongest clue to a new episode being caused by the same condition.
What this could mean for patients and doctors
The authors have already built this network into a clinical decision support system that plugs into GP software and are testing it in a trial in Australian practices. If it continues to perform well, the tool could help doctors quickly see which diagnoses are most and least likely when someone presents with breathlessness, guiding them toward the most informative tests first. This does not replace a doctor’s judgement, and it cannot cover every possible cause, but it can provide an evidence-based "second opinion" drawn from hundreds of thousands of similar cases. In everyday terms, the study suggests that carefully analysed electronic records can be turned into a kind of quiet background advisor—one that helps shorten the road from the first frightening feeling of being short of breath to a clear diagnosis and appropriate treatment.
Citation: Kabir, A., Devaux, A., Jenkins, C. et al. A Bayesian network for identifying causes of breathlessness using a national electronic medical records (EMR) database. Sci Rep 16, 4900 (2026). https://doi.org/10.1038/s41598-026-35250-w
Keywords: breathlessness, primary care, Bayesian network, electronic medical records, diagnostic decision support