Clear Sky Science · en

BreastDCEDL: A standardized deep learning-ready breast DCE-MRI dataset of 2,070 patients

2026-01-15 · Back to index

Why this matters for breast cancer care

When someone is diagnosed with breast cancer, doctors must quickly decide which treatments are likely to work best. Powerful MRI scans can show how a tumor behaves, but turning those scans into reliable, computer-based tools to guide treatment has been difficult. This article introduces BreastDCEDL, a large, carefully prepared collection of breast MRI scans designed specifically to help researchers build and test artificial intelligence (AI) systems that predict how tumors will respond to therapy.

Seeing tumors change over time

Doctors often use a special type of MRI called dynamic contrast-enhanced MRI (DCE-MRI) to view breast tumors. In this scan, images are taken before and after a contrast dye is injected, capturing how blood flows through the tumor over several minutes. Cancerous tissue tends to have leaky, disorganized blood vessels, so it lights up and fades differently than normal tissue. These time-lapse images can reveal how aggressive a tumor is, and may help predict whether it will disappear completely after powerful medicines such as chemotherapy.

Turning scattered scans into one clear resource

Until now, progress in AI for breast MRI has been slowed by scattered data: different hospitals store images in different formats, use different scanners, and record clinical details in different ways. The BreastDCEDL project tackled this problem by pulling together pretreatment DCE-MRI scans from 2,070 patients in three major research groups known as I-SPY1, I-SPY2, and Duke. The team converted more than 8.5 million individual image slices into just over eleven thousand 3D volumes using a standard format widely used in medical imaging research. They also carefully sorted images in time (before contrast, early after, and later after) and space, so that each patient’s scans line up correctly.

Marking the tumors and matching the facts

For AI to learn, it must know where the tumor is and what happened to the patient. In BreastDCEDL, each patient has tumor markings and key clinical information. For the I-SPY groups, complex computer codes describing tumor outlines were decoded into simple 3D masks that mark tumor areas voxel by voxel. For the Duke group, expert radiologists drew bounding boxes around the largest tumor in each case. Alongside the images, the dataset includes patient age, basic demographic details, tumor size, hormone receptor (HR) status, HER2 status, and whether the tumor fully disappeared after treatment—a result called pathologic complete response, or pCR. This outcome, available for 1,452 patients, is closely linked to long-term survival and is a prime target for prediction models.

Building fair tests for AI tools

To make it easy to compare new AI methods, the authors provide fixed training, validation, and test groups, with similar rates of pCR across them. This means different research teams can test their models on the exact same patient sets, making performance claims more trustworthy. The dataset also keeps the natural variety seen in real hospitals: scans come from many centers, different MRI machines, and slightly different ways of defining HR and HER2 positivity. Rather than smoothing these differences away, BreastDCEDL records them clearly, so researchers can decide how to handle them and test whether their models still work across varied patient populations and scanning conditions.

What this unlocks for future research

BreastDCEDL is more than just a stack of images; it is a well-organized toolkit for many types of studies. Researchers can train AI systems to locate tumors, measure tumor volume, predict pCR before treatment starts, and explore how imaging patterns relate to tumor biology. Patients without outcome data still help by providing extra examples for unsupervised and semi-supervised learning. Because all files follow a simple naming system and common format, scientists can quickly load and analyze them using standard software, saving days of manual preparation and reducing the chance of errors.

A clearer path toward personalized treatment

In simple terms, this work turns a messy collection of breast MRI scans from multiple hospitals into a clean, shared foundation for AI research. By standardizing how images and clinical information are stored, and by marking tumors and outcomes consistently, BreastDCEDL gives researchers what they need to build and fairly test computer tools that might one day help doctors choose the right treatment for each patient. While it does not by itself cure cancer, it removes a major obstacle on the road to more precise, data-driven breast cancer care.

Citation: Fridman, N., Solway, B., Fridman, T. et al. BreastDCEDL: A standardized deep learning-ready breast DCE-MRI dataset of 2,070 patients. Sci Data 13, 264 (2026). https://doi.org/10.1038/s41597-026-06589-6

Keywords: breast MRI, cancer imaging, medical AI, treatment response, medical datasets