Clear Sky Science · en
Development and validation of an artificial intelligence-based model for diagnosing benign, borderline, and malignant adnexal masses
Why this matters for women’s health
Ovarian and other adnexal masses are common findings on pelvic ultrasound, yet deciding which ones are harmless and which signal early cancer is still hard, even for experts. This study reports a new artificial intelligence (AI) system, called Clinical‑OMTA, that reads ultrasound images and helps doctors sort these masses into three key groups—benign, borderline, and malignant—so that women get the right treatment while avoiding unnecessary surgery.
Three kinds of growths, three very different choices
Not all adnexal masses are created equal. Benign growths can often be watched or removed with simple surgery. Malignant tumors are life‑threatening cancers that need specialist surgery and chemotherapy. Borderline tumors sit uneasily in between: they can recur but often affect younger women who wish to keep their fertility, so surgeons try to remove only what is necessary. Unfortunately, on ultrasound these three categories can look very similar. Borderline tumors in particular can mimic either a harmless cyst or an aggressive cancer, making treatment decisions stressful for patients and clinicians alike.
Turning complex scans into clearer answers
Ultrasound is usually the first and most widely available test for adnexal masses, but interpreting the grainy, highly variable images demands considerable experience. Existing scoring systems and risk calculators, such as the widely used ADNEX model, combine specific ultrasound features with simple clinical information like age and a blood marker (CA125), yet they still rely on human observers to describe the images correctly. Recent work in deep learning—a branch of AI that learns patterns directly from pixels—offers a chance to bypass some of this subjectivity by training computers to recognize subtle image signatures of different tumor types.
An AI assistant trained across many hospitals
Building on earlier work, the authors designed Clinical‑OMTA, a dual‑pathway AI model that first separates benign from non‑benign masses and then distinguishes borderline from malignant ones. The system digests greyscale ultrasound images and can also take age and CA125 values as optional inputs. To teach and test the model, the team assembled a large, diverse dataset: 2381 women from 23 hospitals across China, scanned on 38 types of ultrasound machines. Most cases had surgical confirmation of the diagnosis; a smaller group of clearly benign cysts was confirmed by at least six months of ultrasound follow‑up. The data were split into training sets, internal test sets, and two fully independent external test cohorts, including both still images and short video sweeps of the ovaries. 
How well the AI performed in real‑world settings
On external test images, Clinical‑OMTA correctly separated benign, borderline, and malignant masses with accuracy similar to both the ADNEX model and the judgment of an expert ultrasound examiner. Its performance was stable across different ultrasound brands, scanning methods (through the abdomen or the vagina), and the two external hospitals, suggesting that the model is not overly tuned to one particular device or center. The system also worked well on video clips, not just still frames. Interestingly, feeding in age and CA125 did not improve its decisions over using ultrasound images alone, echoing earlier studies showing that this blood marker adds little when high‑quality imaging is available. 
Helping less‑experienced doctors, and its limits
The researchers then asked 11 radiologists—junior, intermediate, and highly experienced—to classify the same cases, first unaided and then with the AI’s output and heat‑map overlays that highlight image regions the model finds important. With Clinical‑OMTA’s help, junior doctors’ accuracy jumped by about 18–20 percentage points, and intermediate readers also improved markedly, reaching near‑expert performance. Agreement between readers, which had previously ranged from only fair to moderate, rose to very high levels when they used the tool. At the same time, the study notes that such strong alignment may reflect “automation bias,” where clinicians lean too heavily on the AI, particularly in the most ambiguous borderline cases. The authors therefore stress that heat maps are research tools, not stand‑alone explanations, and that AI guidance must be integrated carefully into clinical training and decision‑making.
What this means for patients
Overall, Clinical‑OMTA shows that an AI system trained on diverse ultrasound data can match expert performance in classifying adnexal masses into benign, borderline, and malignant categories, while substantially boosting the skills and consistency of less‑experienced radiologists. Because it works across different machines and centers, the model could eventually be embedded in scanners or used as stand‑alone software to support doctors in busy or under‑resourced clinics. The authors caution that further prospective and international studies are needed before routine use, especially in settings with lower‑end equipment or non‑specialist operators. Still, their work points toward a future in which more women, regardless of where they are treated, can benefit from expert‑level interpretation of ovarian ultrasound scans and more tailored, timely care.
Citation: Wu, Y., Dai, W., Li, X. et al. Development and validation of an artificial intelligence-based model for diagnosing benign, borderline, and malignant adnexal masses. npj Precis. Onc. 10, 106 (2026). https://doi.org/10.1038/s41698-026-01320-5
Keywords: ovarian ultrasound, artificial intelligence, adnexal masses, borderline ovarian tumors, clinical decision support