Clear Sky Science · en
Finding the most promising indications for novel treatments in oncology
Why finding the right patients matters
Modern cancer drugs can be lifesaving, but figuring out which exact groups of patients will benefit is slow, costly, and uncertain. Each new medicine must be tested in specific cancer types and subtypes, and choosing the wrong ones can waste years of research and millions of dollars—while patients wait. This study presents a data-driven way to guide those choices earlier and more systematically, using information from millions of real patients rather than relying mostly on hunches and chance discoveries.

Turning everyday medical data into a map
The authors build an approach they call INSPIRE, which stands for “INdication Selection and Prioritization In Real-world data and Evaluation.” Instead of starting from lab experiments alone, INSPIRE learns from large real-world datasets collected during routine care in the United States—electronic health records and insurance claims for more than two million people with cancer. These records contain a long trail of events for each patient: diagnoses, treatments, lab tests, tumor samples, and more. The team transforms each of these events into a mathematical “feature” and then uses machine learning to place them in a shared space where medical events that tend to occur in similar patients end up close together.
Looking beyond broad cancer labels
Most hospital and billing systems describe diseases using administrative codes that emphasize where a tumor is located (for example, which part of the lung) rather than what it looks like under the microscope. For cancer drug development, this is often not precise enough, because two tumors in the same organ can behave very differently and respond to different therapies. INSPIRE tackles this by working directly with pathology reports—the detailed descriptions of tumor tissue. From these reports, the method builds finely grained cancer categories such as specific lung cancer subtypes and separates earlier disease from advanced, metastatic disease. It then “broadcasts” this tumor information along the patient’s timeline so it can be linked to treatments, test results, and other events that happen later.
Testing the method on a major immunotherapy
To see whether INSPIRE could have helped guide real-world decisions, the researchers focused on drugs that block PD‑1, an immune checkpoint targeted by widely used cancer immunotherapies. They imitated the situation in which these drugs were still new by only using data from 2012 to 2015 and excluding all patients who received a PD‑1 drug or had the related biomarker test. They chose three cancers that were among the first to gain approval for PD‑1 treatment as “reference” diseases. INSPIRE then measured how similar every other cancer subtype in the data was to these references, based on patterns in the patient journeys, and produced a ranked list of promising indications without knowing which ones would later receive official approval.

What the rankings revealed
When the authors “unblinded” the results and compared INSPIRE’s ranked list to the approvals that regulators would grant after 2015, about 70 percent of the cancer indications that eventually gained PD‑1 approval appeared in the top 50 spots. Cancers where PD‑1 drugs repeatedly failed in trials tended to rank lower. The method showed similar performance when the researchers expanded the time window to include more recent years and when they varied the internal parameters of the model, suggesting that the approach is fairly robust. Analyses also indicated that INSPIRE’s internal map of features grouped together medically related items—such as tumor types, treatments, and biomarkers—supporting the idea that it captures meaningful clinical structure rather than random patterns.
How this could change cancer drug development
INSPIRE is not meant to replace laboratory science or clinical judgment, but to add another line of evidence. In practice, a company or academic group developing a new cancer drug could feed in a small number of tumor types where there is already strong evidence the drug works. INSPIRE would then use the real-world data map to highlight other cancer subtypes that look similar in terms of how patients present, progress, and are treated. Those indications could be prioritized for further biological study and, eventually, for clinical trials. By improving the odds of choosing the right cancers to test first, approaches like INSPIRE could shorten development timelines, reduce costs, and help patients access effective therapies sooner.
Citation: Eckhoff, M., Klingelschmitt, S., Van Ruijssevelt, L. et al. Finding the most promising indications for novel treatments in oncology. npj Precis. Onc. 10, 135 (2026). https://doi.org/10.1038/s41698-026-01352-x
Keywords: cancer drug development, real-world data, machine learning in oncology, immunotherapy, treatment indication selection