Clear Sky Science · en

Improving non-expert performance in musculoskeletal MRI protocoling through a large language model

2026-03-06 · Back to index

Smarter Scans for Aching Joints

When you hurt a knee, shoulder, or back, doctors often order an MRI to look inside your muscles, joints, and bones. But getting a useful MRI is not as simple as pressing a button. Technicians and junior doctors must choose from dozens of scan options, and mistakes can mean blurry or incomplete pictures, wasted time, and a second trip into the scanner. This study asks a timely question: can a modern AI language tool quietly sit beside these non-experts and help them pick better musculoskeletal MRI settings the first time?

Why Choosing the Right MRI Matters

Musculoskeletal MRI covers a huge variety of body parts—from fingers to spine—and many different problems, such as sports injuries, arthritis, and infections. Each situation may require a different combination of scan angles, areas to cover, and tricks to reduce metal artifacts from implants. Hospitals therefore maintain large libraries of protocols, often hundreds of variations, that have evolved through years of local trial and error. As imaging workloads grow and specialist radiologists are stretched thin, more of this "protocoling" work is handed to residents and radiographers. Errors can force patients to return for repeat scans, disrupting schedules and delaying diagnoses.

Bringing an AI Co-Pilot into the Control Room

The researchers tested a commercially available large language model (GPT‑4o) as an assistant rather than a replacement for human staff. They first gathered more than 12,000 past musculoskeletal MRI orders from one hospital to design a detailed instruction prompt for the AI. For each case, the AI received anonymized information taken from the electronic medical record: the type of exam ordered, the clinician’s comments, recent imaging reports, and relevant medical notes. Through an iterative two‑week process, the team refined a single long prompt so that the AI would output a structured worksheet: the recommended protocol name, the exact area to scan, whether to use metal‑artifact reduction, how to suppress fat signals, what coil to use, and other details. They built rules into the prompt to minimize guesswork and keep the AI’s responses consistent.

Putting Non-Experts and AI to the Test

To see whether the AI actually helped people, the team assembled a separate test set of 107 new MRI orders. Three radiology residents and three radiographers, all with less than a year of experience, were asked to complete protocol worksheets for every case twice: once using only their own judgment, and once with AI assistance. A crossover design and a six‑month gap between sessions helped prevent simple memory effects. Two expert musculoskeletal radiologists, working from their own gold‑standard protocols, scored every worksheet on a four‑point "clinical pass" scale that reflected real‑world consequences, from a complete failure requiring a full repeat scan up to an excellent match.

What Changed When AI Joined the Team

With the AI’s help, average scores rose for both residents and radiographers, and the improvement was statistically significant. The most meaningful difference was not in small score shifts, but in shrinking the share of protocols judged likely to trigger a partial or full repeat MRI. For residents, these risky cases dropped by about 12%; for radiographers, by about 8%. In a busy department handling around 40 such exams a day, that could translate into several fewer problematic scans daily. The AI’s own outputs were reasonably stable: when the same case was run five times, its scores agreed well, and more than half of the time it produced an excellent plan in every run. When it erred, problems usually involved fine technical choices, such as specific fat‑suppression methods or exact targeting, and human reviewers could often spot these issues in the AI’s accompanying reasoning.

How People Used and Misused the Help

Surveys of the participating staff revealed how humans actually interacted with the tool. They found the AI’s extra comments—about where exactly to focus the scan, which coil to use, and why a given protocol made sense—especially valuable. Many participants later recalled and reused ideas from the AI even when they were working without it, suggesting a training benefit. At the same time, the study uncovered traces of automation bias and confirmation bias: participants tended to lean on the AI more when it matched their initial instinct, and sometimes followed it even when its standalone score was poor. Still, overall performance rarely worsened with AI help, and only a small fraction of cases were harmed by its suggestions.

What This Means for Patients and Clinics

To a patient lying in an MRI scanner, the behind‑the‑scenes details of protocol selection are invisible. Yet this study suggests that a carefully designed AI language model, used through an ordinary chat interface, can quietly raise the quality of musculoskeletal MRI planning by junior staff and reduce the number of scans that need to be redone. The system is not a replacement for expert radiologists, and its performance depends on local rules and thoughtful oversight. But as a practical co‑pilot that helps non‑experts choose better settings the first time, it offers a glimpse of how AI could make advanced imaging both more efficient and more patient‑friendly.

Citation: Lee, S., Choi, H., Chun, K.S. et al. Improving non-expert performance in musculoskeletal MRI protocoling through a large language model. Sci Rep 16, 12423 (2026). https://doi.org/10.1038/s41598-026-41898-1

Keywords: musculoskeletal MRI, radiology workflow, large language models, clinical decision support, medical imaging AI