Clear Sky Science · en

Validation of automated 5 mL thin liquid swallowing sound segmentation for estimating audio-derived pharyngeal clearance time

2026-03-03 · Back to index

Why swallowing sounds matter

Swallowing is something most of us take for granted, but for millions of people—especially older adults and those with neurological disease—it can be difficult and dangerous. When swallowing goes wrong, food or liquid can slip into the lungs, leading to malnutrition, choking, or serious infections like pneumonia. Today’s best tests for swallowing problems rely on X‑ray movies taken in hospital. This study explores a much simpler approach: listening to swallowing sounds through a small electronic stethoscope on the neck, and using a computer algorithm to decide how well the throat is clearing each sip of liquid.

From hospital X‑rays to smart bedside tools

The current gold standard for examining swallowing is the videofluoroscopic swallowing study, an X‑ray movie that shows a contrast liquid moving from the mouth down the throat. It reveals how quickly and safely the liquid passes key structures and whether any remains behind. However, this test requires special equipment, trained staff, and exposes patients to radiation, making it hard to repeat often or perform at the bedside or at home. In contrast, cervical auscultation—listening to sounds from the throat—can be done anywhere, but has traditionally been subjective, relying on a clinician’s ear. With advances in digital sensors and signal processing, the authors aim to turn these sounds into a reliable numerical measure that reflects what the X‑rays see.

Capturing the sound of a swallow

The team studied 45 patients in a Japanese hospital who were already undergoing X‑ray testing for suspected swallowing problems and could safely swallow a 5‑milliliter sip of thin liquid. While each patient swallowed, they wore a neck‑worn electronic stethoscope placed over the front of the neck. The same video camera recorded both the X‑ray screen and the sound signal, allowing the two to be precisely synchronized. A rule‑based computer algorithm monitored the loudness of the sound stream in small time slices, marking when a burst of activity began and when it ended. The time between these two points—called the audio‑derived pharyngeal clearance time—was taken as the interval during which the throat was actively moving the liquid through.

Matching sound events to real throat movements

To judge whether the sound‑based timing truly reflected the physical act of swallowing, an experienced speech‑language pathologist went through the X‑ray movies frame by frame. They marked three key moments: when the liquid first touched the flap‑like epiglottis at the back of the tongue, when the upper esophageal sphincter (the gateway to the esophagus) opened, and when it closed again. Together, these landmarks define how long the liquid spends passing through the throat. The researchers then compared these X‑ray timings with the computer’s sound‑based start and end points across 84 swallows. The algorithm successfully detected 80 of them, and in most cases the sound interval overlapped strongly with the X‑ray‑defined throat passage.

How well did the timing line up?

The sound‑based onset occurred after the liquid reached the epiglottis in 96% of swallows and usually within about half a second, indicating that the algorithm is not triggered by early mouth movements but by events in the throat. The sound‑based offset typically happened after the upper esophageal sphincter closed, meaning the captured sound interval covered the full active phase of throat transport. On average, the audio‑derived clearance time was about 0.7 seconds, very close to the 0.79‑second duration measured from the X‑ray landmarks. Importantly, this sound‑based timing stayed stable even in patients who leaked some liquid from the mouth into the throat before the main swallow, a problem known as poor oral containment; in contrast, the X‑ray‑based measure stretched out in these cases. This suggests that the sound method focuses on the core throat action rather than being confused by earlier, passive dribbling.

What this could mean for everyday care

For patients and clinicians, the key message is that a simple, neck‑mounted sensor plus an automatic segmentation algorithm can provide a dependable estimate of how efficiently the throat clears a sip of thin liquid. While it does not capture every phase of swallowing, and may underestimate total swallow time in people with severe mouth‑control problems, it closely tracks the throat phase that matters for clearing material safely. This opens the door to bedside and home‑based screening that can be repeated often, without X‑rays or specialist interpretation. With further validation, such audio‑based measures could support earlier detection of swallowing decline, guide therapy, and help prevent complications like aspiration pneumonia—all by turning the hidden sounds of a swallow into actionable health information.

Citation: Jayatilake, D., Teramoto, Y., Ueno, T. et al. Validation of automated 5 mL thin liquid swallowing sound segmentation for estimating audio-derived pharyngeal clearance time. Sci Rep 16, 11908 (2026). https://doi.org/10.1038/s41598-026-39699-7

Keywords: dysphagia, swallowing sounds, wearable sensors, pharyngeal clearance time, digital health