Clear Sky Science · en

Explainable machine learning with routine biomarkers identifies culture-defined bacteremic urosepsis

2026-03-04 · Back to index

Why this matters for everyday health

Most people think of a urinary tract infection as a painful but simple problem. Yet in some patients, bacteria escape from the urinary tract into the bloodstream, triggering a dangerous whole‑body reaction called urosepsis that can lead to organ failure and death. Doctors need to spot which patients are heading toward this severe state as early as possible, but current bedside tools and slow culture tests often leave a risky blind spot. This study explores whether common blood tests, combined with explainable machine‑learning methods, can flag those high‑risk patients within the first day of hospital care.

From common infection to life‑threatening illness

Urinary tract infections are among the most frequent bacterial infections worldwide. A fraction of these cases progress to urosepsis, where the body’s response to infection becomes widespread and damaging. Early on, patients with a routine UTI and those on the verge of urosepsis can look very similar: they may have fever, pain, and abnormal lab results, but not yet the clear signs of organ failure captured by standard sepsis scores. Blood cultures—needed to prove that bacteria have reached the bloodstream—can take days to turn positive. The authors therefore focused on a practical question: can we use only the lab tests routinely drawn in the first 24 hours to detect which hospitalized UTI patients already have bacteria in their blood?

Building a study around real‑world hospital data

The team analyzed records from 182 inpatients at a single hospital who were suspected of having a urinary infection and had both urine and blood cultures performed. All had urine cultures confirming infection. They were then split into two groups: 89 patients with “bacteremic urosepsis,” meaning the same bacteria were found in both blood and urine, and 93 patients with infection confined to the urinary tract, whose blood cultures stayed negative. For every patient, the researchers collected routine lab results—such as markers of inflammation, blood clotting, kidney function, and protein levels—drawn within 24 hours of the first culture order. Crucially, they used only information available before culture results came back, mirroring the uncertainty faced by clinicians in real time.

What the early lab tests revealed

Several blood measurements clearly differed between the two groups. Patients with bacteremic urosepsis tended to have higher levels of procalcitonin and C‑reactive protein, both indicators of intense inflammation, as well as higher white blood cell counts, blood sugar, creatinine (a marker of kidney stress), and D‑dimer, a breakdown product of blood clots that signals activation of the clotting system. At the same time, they had lower levels of albumin and total protein, suggesting a leakier circulation and poorer overall reserve, and lower platelet and lymphocyte counts. Alone, some of these tests—especially procalcitonin—already showed good ability to separate simple UTI from bloodstream infection, but none was perfect. The authors reasoned that combining them with modern algorithms could capture subtler patterns and interactions.

Training explainable machine‑learning models

To turn these scattered lab values into a practical risk score, the researchers trained three different computer models: traditional logistic regression, and two popular tree‑based machine‑learning methods known as Random Forest and XGBoost. They randomly reserved about one‑fifth of the patients as a hidden test set and trained the models on the rest, carefully avoiding any leakage of future information. When evaluated on the unseen patients, both Random Forest and XGBoost showed “good discrimination,” meaning they assigned higher risk scores to those with bacteremic urosepsis than to those with infection limited to the urinary tract. XGBoost achieved the highest accuracy overall, but because the test group was modest in size, its numerical edge over Random Forest was not statistically firm.

Opening the black box of prediction

A common criticism of machine‑learning in medicine is that it can behave like a black box. To address this, the authors used a technique called SHAP (short for Shapley Additive Explanations) to show how each lab test contributed to the model’s decision for each patient. In these explanations, D‑dimer, platelet count, procalcitonin, age, red cell distribution width, creatinine, white blood cells, and albumin emerged as the most influential features. Rather than any single “magic number,” the models relied on combinations—such as high clotting activity together with strong inflammation and low albumin—to nudge the predicted risk up or down. This transparency helps clinicians judge whether the model’s reasoning aligns with medical understanding and may build trust in its use as a bedside aid.

What this could mean for patient care

The study suggests that a simple set of routine blood tests, interpreted through an explainable machine‑learning model, could help doctors recognize which UTI patients are likely to have bacteria in their bloodstream well before culture results arrive. In practice, such a score might run automatically in the hospital’s laboratory or record system as soon as the first batch of lab results is available, prompting closer monitoring, repeat exams, or earlier adjustments to antibiotics for higher‑risk patients. Still, the work has important limits: it comes from a single center, involves a relatively small number of patients, and focuses on culture‑proven bloodstream infection rather than all forms of sepsis, including culture‑negative cases. The authors stress that their model is a promising prototype, not yet ready for routine use, and that larger, multi‑center studies are needed to confirm how well it works and how it might change outcomes.

Citation: Zhang, YL., Yu, DX., Zheng, YY. et al. Explainable machine learning with routine biomarkers identifies culture-defined bacteremic urosepsis. Sci Rep 16, 11982 (2026). https://doi.org/10.1038/s41598-026-42178-8

Keywords: urosepsis, urinary tract infection, sepsis biomarkers, machine learning in medicine, early infection detection