Clear Sky Science · en
Clinical utility of foundation models in musculoskeletal MRI for biomarker fidelity and predictive outcomes
Why this matters for joint health
Many people live with knee, hip, shoulder, or back pain, yet most MRI scans of these joints still produce written reports rather than hard numbers about tissue health. This study shows how a new kind of artificial intelligence system can turn routine musculoskeletal MRI into consistent measurements that help doctors both manage daily workload and estimate which patients are more likely to need joint surgery in the future.

Turning pictures into useful joint measurements
The researchers built a modular system that takes standard MRI scans of five key body regions: the knee, hip, shoulder, lumbar spine, and thigh. Using large "foundation" segmentation models originally trained to outline objects in many kinds of images, they fine-tuned these tools to recognize cartilage, bone, muscle, fat, and nerves in joint MRI scans. Once these structures are outlined, the system computes numeric measurements such as cartilage thickness, muscle volume, disc height in the spine, and relaxation times that reflect tissue composition. These numbers act as biomarkers, giving a repeatable way to describe how healthy or worn different tissues are.
Making the AI reliable enough for the clinic
To be useful in real care, the automated measurements must closely match what expert radiologists would produce by hand. The team tested their system on 913 MRI exams from 12 different datasets, covering a mix of scanning machines, settings, and body regions. They compared the borders drawn by the AI models with expert outlines and then compared the resulting measurements. After fine-tuning, the best model consistently reached high overlap scores and produced cartilage thickness, disc height, and muscle volume values that were almost indistinguishable from expert results, often agreeing within a fraction of a millimeter or a few percent. The system also held up well across different scan protocols, including lower-quality or undersampled images, suggesting it can handle the variety found in everyday practice.
Automating the hardest parts of MRI analysis
Manually drawing boxes and outlines around every structure in hundreds of images is slow work. To avoid this bottleneck, the authors added an object detection step that automatically proposes regions of interest before segmentation. This "AutoLabel" setup processes a 3D knee scan in roughly half a minute on a modern graphics card. While using automated prompts caused a small drop in accuracy compared with carefully drawn boxes, the measurements remained within clinically acceptable limits. The system also analyzes how scan settings, such as echo time and flip angle, influence performance, and it shows that training on many different protocols together can reduce these effects. This design makes it easier to plug in new segmentation models later without changing how measurements or clinical decisions are made.

Helping radiologists focus and looking ahead in time
Using knee MRI as a test case, the team demonstrated two concrete clinical uses of their measurement pipeline. First, they built a multi-stage triage system that uses cartilage thickness and bone volume measurements to flag scans more likely to show important joint damage. In simulations on 930 knee exams, this cascade could safely remove more than half of the scans from urgent review while still keeping nearly all of the serious cases in the queue, trimming hours of verification work per thousand studies. Second, they followed people from the Osteoarthritis Initiative over several years, using changes in cartilage and meniscus thickness to predict which knees would go on to develop radiographic osteoarthritis or require total knee replacement. Models that included these imaging biomarkers, along with age, sex, and body mass index, identified higher-risk knees more accurately than models using demographics alone.
What this means for future joint care
Overall, the study shows that foundation models can do more than just draw neat outlines on MRI scans: when carefully adapted and checked against expert measurements, they can supply stable, quantitative biomarkers that plug directly into clinical tools. By separating the measurement layer from the specific AI model, the framework lets hospitals upgrade segmentation technology over time without rebuilding their decision systems. This creates a path in which today’s routine scans feed tomorrow’s personalized joint care, supporting both faster worklists for radiologists now and better risk stratification for patients who may face osteoarthritis or joint replacement later.
Citation: Hoyer, G., Tong, M.W., Bhattacharjee, R. et al. Clinical utility of foundation models in musculoskeletal MRI for biomarker fidelity and predictive outcomes. npj Digit. Med. 9, 383 (2026). https://doi.org/10.1038/s41746-026-02520-w
Keywords: musculoskeletal MRI, osteoarthritis, knee replacement, medical AI, cartilage thickness