Clear Sky Science · en

Cross-cancer survival prediction using machine learning models

· Back to index

Why predicting cancer survival in new ways matters

Cancer patients and their families often ask a simple but agonizing question: “How long do I have?” Doctors try to answer using their experience and past data, but for many rarer cancers there just are not enough similar cases to guide precise predictions. This study explores whether modern computer programs can safely “borrow experience” from common cancers to help forecast survival for less common ones, potentially giving more patients clearer expectations and better-tailored care.

Figure 1
Figure 1.

Using past patients to guide future care

The researchers worked with a large trove of real-world information from hospital cancer registries in São Paulo, Brazil. These records cover more than a million patients treated between 2000 and 2019 and include details such as age, tumor stage, treatments received, and whether the person was still alive three years after diagnosis. Focusing on that three-year mark allowed the team to compare cancers with very different typical lifespans while avoiding extremely lopsided data, where almost everyone either survives or dies.

Teaching computers to spot survival patterns

To turn this registry into a prediction tool, the authors used two popular machine learning methods, XGBoost and LightGBM. These methods do not try to understand biology directly; instead, they sift through thousands of patient histories to find patterns linking characteristics like disease stage and treatment timing to later survival. First, the team built “specialist” models, each trained only on one cancer type, such as breast, lung, or stomach cancer. They then checked how well these models could predict three-year survival for new patients with the same cancer, using standard measures that balance correct identification of survivors and non-survivors.

Can one cancer help predict another?

The heart of the study asks a bold question: can a model trained on one cancer type successfully predict survival in a different cancer? To test this, the researchers grouped cancers in two ways: the most common cancers (skin, breast, prostate, colorectal, lung, and cervical) and cancers of the digestive system (oral cavity, oropharynx, esophagus, stomach, small intestine, colorectal, and anus). In a first phase, they trained separate models for each cancer and tried them out on the others, selecting only pairings where both survival and non-survival were predicted with reasonable balance. In later phases, they merged data from selected cancers into shared training sets, creating more general models that drew on patterns across related tumors.

Figure 2
Figure 2.

Where cross-cancer learning helps—and where it does not

For the common cancers, combining data across types did not beat the best specialist models. A single model trained on all six common cancers, for example, predicted less accurately than models tailored to each cancer alone. The story was different for some digestive system cancers. When data from oral cavity, esophagus, and stomach cancers were pooled, the resulting model predicted three-year survival for stomach cancer slightly better than the stomach-only model, with balanced accuracy just above 80 percent. However, statistical tests showed this improvement was not clearly different from chance, meaning the shared model and the specialist model were essentially tied. Similar “almost but not quite better” results appeared for oral cavity, small intestine, and colorectal cancers, often with trade-offs between correctly identifying survivors versus non-survivors.

What this means for patients with rare cancers

Even though cross-cancer models rarely outperformed the best disease-specific models, they often came close—using only information borrowed from other cancer types. For rare cancers that lack large, high-quality datasets, this is an encouraging sign: in the future, doctors may be able to lean on models trained on more common cancers to offer meaningful survival estimates when specialist tools are impossible to build. The authors caution that these methods are not ready for routine clinical use, and that they must be tested in other regions and combined with deeper biological data. Still, the work points toward a future where no patient is left without guidance simply because their cancer is uncommon.

Citation: Cardoso, L.B., Egydio, J.E., Toporcov, T.N. et al. Cross-cancer survival prediction using machine learning models. Sci Rep 16, 9623 (2026). https://doi.org/10.1038/s41598-025-34133-w

Keywords: cancer survival prediction, machine learning in oncology, cross-cancer modeling, rare cancers, clinical registries