Clear Sky Science · en
Performance of breast cancer risk prediction algorithms across mammography systems in the UK screening programme
Why this matters for women and families
Breast screening saves lives by finding cancers early, yet many tumors still appear in the years between routine mammograms, often at a more advanced stage. This study asks a simple but important question: can artificial intelligence (AI) read a “normal” mammogram and quietly flag women who are actually at high short‑term risk, so they can be offered extra checks before a cancer grows and spreads?

Seeing more in a normal mammogram
Most national screening programs, including the UK’s, invite women for a mammogram every three years. If nothing suspicious is seen, they are told the scan is “negative” and they return to usual life. Yet about 30% of breast cancers in screened women are “interval cancers” that surface between scheduled visits and tend to have a poorer outlook. Recently, powerful AI systems have learned to scan mammograms that appear normal to human readers and assign each woman a short‑term risk score. The idea is to use this hidden information to tailor how often women are screened and who should be offered more sensitive tests such as MRI or contrast‑enhanced mammography.
Putting four AI tools to the test
The researchers examined 112,621 negative screening mammograms from two NHS Breast Screening Programme sites in England, covering one full three‑year round from 2014 to 2017 and following women for five years. The two sites used different digital mammography machines (Philips and GE), mirroring real‑world variation. Over follow‑up, 1,225 women developed breast cancer, including 396 interval cancers and additional cancers found at the next screening round. Four leading AI risk algorithms—three commercial and one academic model—were run locally on every mammogram to generate a risk score for future cancer, and their performance was compared.
How well the algorithms spotted future cancers
All four AI systems were able to distinguish, better than chance, between women who would and would not develop cancer, but they did not perform equally well. One algorithm (labelled DL‑1) consistently showed the strongest performance, while another (DL‑3) lagged behind. When the team focused on interval cancers—those appearing soon after a “normal” scan—the best model reached accuracy levels similar to, or better than, previous single‑algorithm studies. Importantly, three of the four tools behaved similarly on both Philips and GE images, suggesting they can cope with at least some differences in scanning hardware, although one algorithm did noticeably worse on one system.
What happens if we act on the highest risk scores?
The practical question for screening services is how many women to recall based on AI scores. The researchers therefore looked at clinically meaningful cut‑points. If only the top 4% highest‑risk women (by each tool’s scores) were selected for extra attention, the best two algorithms together captured about one in five of all future cancers and more than a quarter of interval cancers. When the threshold was relaxed to include the top 14% of risk scores—closer to recall rates seen in some North American programs—the yield roughly doubled: the strongest model identified around 42% of all future cancers and half of the interval cancers. However, each algorithm tended to flag a partly different subset of cancers, with relatively little overlap, hinting that ensembles or multi‑tool strategies might find more tumors than any single model alone.

Strengths, gaps, and next steps
This work stands out because it uses complete, routine data from two large NHS screening centers rather than a narrowly selected research sample, and it is the first to evaluate several named AI risk tools side‑by‑side in the UK setting. At the same time, there are limitations. Women with implants or non‑standard imaging views were excluded, and the study covered only two mammography brands, so performance on other equipment or across different ethnic groups remains uncertain. Because the analysis was retrospective, some cancers that might have been found earlier with risk‑based extra imaging were not counted, meaning the true benefit could be larger than reported.
What this means for future breast screening
For a lay reader, the conclusion is that modern AI can indeed find warning signs in “normal” mammograms that predict which women are more likely to develop breast cancer soon, especially interval cancers that are otherwise hard to catch early. The best algorithms could, in principle, let screening programs offer more frequent or more sensitive tests to a relatively small group of higher‑risk women, while others continue with standard three‑year checks. Yet the differences between tools, and between imaging systems, show that no single AI model is ready to be adopted everywhere without careful testing. The authors argue for large prospective trials using multiple algorithms, along with fine‑tuning for local scanners and populations, before AI‑guided, risk‑based breast screening can safely become routine care.
Citation: Rothwell, J., Payne, N., Kilburn-Toppin, F. et al. Performance of breast cancer risk prediction algorithms across mammography systems in the UK screening programme. npj Digit. Med. 9, 330 (2026). https://doi.org/10.1038/s41746-026-02507-7
Keywords: breast cancer screening, artificial intelligence, mammography, risk prediction, interval cancers