Clear Sky Science · en
Evaluating LingualAI: a prospective validation of AI-based real-time translation against certified human interpreters
Bridging Language Gaps in the Doctor’s Office
Millions of people in the United States struggle to communicate with their doctors because they are not fluent in English. Professional interpreters can help, but they are not always available, especially in busy clinics, rural areas, or late-night visits. This study looks at whether a homegrown phone-based tool called LingualAI, which offers real-time English–Spanish translation, can safely support conversations between doctors and patients when a human interpreter is hard to reach.

Why Language Support Matters for Health
More than 25 million people in the U.S. speak English less than “very well,” and this language gap is tied to problems such as misunderstanding diagnoses, missing follow-up visits, and worse health outcomes. Research shows that when patients can speak in their preferred language, care tends to be safer and more effective. Yet hospitals and clinics often lack enough certified interpreters to cover every visit, particularly in primary care and emergency settings. As artificial intelligence tools become more common, health systems are asking whether they can fill part of this gap without putting patients at risk.
How the Researchers Tested LingualAI
The team at UTHealth Houston created three realistic ear, nose, and throat clinic scenarios in both English and Spanish, with scripted lines for a clinician and a patient. Native speakers recorded each line, which was then translated in two ways: by certified medical interpreters and by LingualAI. Nine bilingual clinicians listened to anonymized audio clips, without being told which came from humans or the AI, and rated them on a five-point scale. They judged many aspects of quality, including how accurate the medical terms were, whether the meaning came through clearly, how complete the translation was, and how natural and culturally appropriate the speech sounded.

What the Study Found About Meaning and Style
On the most important question—whether the core medical message made it across—the AI system did surprisingly well. For both medical terminology and overall meaning, LingualAI’s scores were very close to those of certified interpreters. The researchers had defined in advance how much worse the AI could be and still be considered “good enough,” and LingualAI met this bar for meaning, terminology, and completeness of the message. In other words, in these controlled tests, the tool usually said the right medical thing in the right language.
Where Human Interpreters Still Shine
The picture changed once listeners focused on how the words were delivered. Human interpreters scored clearly higher on grammar, word choice, and cultural fit, as well as on how smooth, natural, and expressive the speech sounded. The AI’s voice tended to be more mechanical, with awkward pauses and a flat tone that could make reassurance or empathy feel less genuine. When asked which version they preferred, raters leaned strongly toward human interpreters for speech flow, rhythm, and overall trust. These differences mattered enough that the AI did not meet the preset standard for being “not worse than” humans in these delivery-focused areas.
Speed, Cost, and a Shared-Responsibility Model
LingualAI translated each spoken line in about ten seconds, fast enough to fit into a natural back-and-forth conversation. It was also far cheaper to operate than traditional phone or video interpreting services, with estimated costs of only a few cents for a 10-minute conversation compared with several dollars for a human service. Because of this, the authors suggest an “interpreter-in-the-loop” model. In this approach, LingualAI would handle routine, low-risk exchanges, while certified interpreters would step in for critical decisions, emotional discussions, or whenever the AI’s confidence is low or a clinician or patient asks for human help.
What This Means for Patients and Clinicians
For people who face language barriers, this study offers cautious optimism. LingualAI appears capable of carrying medical meaning across languages reasonably well, especially for common English–Spanish conversations. At the same time, the tool still falls short of human interpreters in warmth, nuance, and reliability for high-stakes talks. The authors conclude that AI translation should not replace certified interpreters, but it can be a useful backup when human help is delayed or unavailable, as long as human experts remain involved in the most sensitive and important parts of care.
Citation: Singh, U.P., Jaimes Garcia, C.A., Aisenberg, G.M. et al. Evaluating LingualAI: a prospective validation of AI-based real-time translation against certified human interpreters. npj Health Syst. 3, 29 (2026). https://doi.org/10.1038/s44401-026-00080-5
Keywords: medical translation, language barriers, AI in healthcare, clinical communication, interpreters