Clear Sky Science · en

End-to-end pipeline for automated heart failure diagnosis with clinical notes using SNOMED-CT

· Back to index

Why smarter reading of medical notes matters

Heart failure is common, deadly, and often diagnosed too late. Yet much of the early warning information about a patient is buried in doctors’ free‑text notes rather than in neat checkboxes or lab tables. This study shows how artificial intelligence can turn those messy notes—written in German—and routine hospital data into a structured view of each patient, and then use that view to help doctors decide who does and does not have heart failure.

Figure 1
Figure 1.

From scattered words to organized information

Doctors’ notes are rich but chaotic: they contain shorthand, abbreviations, and different ways of saying the same thing. The authors built an end‑to‑end digital pipeline that starts from these raw notes plus standard electronic health record (EHR) data for 846 hospital patients with and without heart failure. First, the system automatically expands abbreviations based on the surrounding sentence, so that a short code like “HT” is interpreted correctly as “hypertension” rather than, say, “head trauma.” It does this in a “zero‑shot” way, relying on large language models and example sentences rather than on hand‑labeled training data for each abbreviation.

Crossing the language barrier and linking to a medical map

Because many existing tools and reference terminologies are English‑based, the next step translates German clinical notes into English. After translation, the pipeline searches for medically meaningful phrases and links them to concepts in SNOMED‑CT, a large, hierarchically organized “map” of diseases, findings, and procedures, as well as to the broad UMLS terminology. Instead of just matching exact strings, the system uses semantic similarity: it embeds both the note fragments and all candidate concept descriptions into a numerical space and retrieves the closest matches. A two‑stage process—first generous candidate gathering, then stricter filtering and use of context examples—balances high coverage with precision, and can be refined over time using feedback from real data and clinicians.

Figure 2
Figure 2.

Putting the pipeline to the test

The researchers rigorously evaluated each major step. On widely used English test sets, their abbreviation expansion reached up to 96.1% total accuracy, rivaling or beating earlier methods. Their entity‑linking approach achieved competitive scores compared with the established MedCAT toolkit, and a survey of three cardiologists who reviewed links on German records judged about three quarters of them to be complete matches. Finally, the team combined the standardized SNOMED‑CT concepts with structured EHR information (such as age, lab values, and diagnoses) and trained a support vector machine classifier to sort patients into four groups: no heart failure and three main heart failure subtypes. The best version reached an F1 score of 65.3%, essentially matching a strong neural baseline based on a fine‑tuned German medical BERT model.

What the system gets right—and where it struggles

The classifier was particularly good at recognizing patients with no heart failure (about 86% accuracy) and those with clearly reduced pumping function. It did less well on the “in‑between” group with mildly reduced function, which is also difficult for human doctors and often overlaps clinically with other forms. The authors’ approach has several advantages: it can work even when training data are scarce, it is more transparent than black‑box neural text models because predictions are tied to explicit medical concepts, and it helps make German notes interoperable with international standards. At the same time, the study highlights remaining challenges, including occasional mis‑links between similar concepts, the difficulty of capturing nuances such as symptom severity, and the possibility that discharge summaries may already contain late‑stage clues that make the task easier than truly early detection.

What this means for patients and doctors

In plain terms, this work shows that computers can learn to read and organize complex clinical notes well enough to assist in diagnosing heart failure at a level comparable to cutting‑edge neural networks, while remaining more interpretable and easier to adapt to new hospitals and languages. By turning unstructured text into standardized building blocks on a shared medical map, the pipeline paves the way for decision support tools that can flag at‑risk patients earlier, help avoid missed or delayed diagnoses, and support more personalized care—first for heart failure, and ultimately for many other diseases.

Citation: Tang, FS.KB., Verket, M., Müller-Wieland, D. et al. End-to-end pipeline for automated heart failure diagnosis with clinical notes using SNOMED-CT. Sci Rep 16, 12751 (2026). https://doi.org/10.1038/s41598-026-48771-1

Keywords: heart failure diagnosis, clinical notes, SNOMED CT, medical text mining, clinical decision support