Clear Sky Science · en

Benchmarking deep learning models for predicting anticancer drug potency (IC50) with insights for medicinal chemists

· Back to index

Why this research matters for future cancer drugs

Designing new cancer drugs is slow and expensive because each promising molecule must be tested in living cells to see how strongly it stops their growth. This study asks a practical question: can modern artificial intelligence tools reliably predict those test results in advance, saving time and cost in the lab? The authors systematically compare several popular deep learning systems, probe when they succeed or fail, and even propose a more realistic way to judge their usefulness for working medicinal chemists.

Figure 1
Figure 1.

Measuring how strongly a drug fights cancer cells

When researchers test a potential anticancer compound, they often report a number called IC50: the concentration at which the drug cuts cell growth by half. A low IC50 means a potent drug. But the same compound can have very different IC50 values in different cancer cell lines, and even repeated tests on the same pair of drug and cell can vary by several-fold depending on the assay and conditions. Traditional computer-aided design methods capture how a molecule fits a single protein target, but they struggle with the full complexity of living cells. Newer deep learning methods try to learn patterns directly from large datasets that link chemical structures and detailed genetic information about cancer cells to their measured IC50 values.

Putting five deep learning tools to the test

The authors examined five leading deep learning models, each using a different strategy to represent both drugs and cancer cells. Some treat molecules as graphs of atoms and bonds; others turn cell genetics into structured networks of biological processes or highlight the most informative genes. All models were trained and evaluated on the same curated data from a major resource called GDSC, which contains tens of thousands of measured IC50 values. The team also built a deliberately simple comparison method: a “baseline” that ignores biology and chemistry and just predicts average IC50 values from the training data. This allowed them to ask not only which deep model is best, but whether any of them truly beat a very naive shortcut.

Figure 2
Figure 2.

A more realistic way to score predictions

Common machine learning scores, such as correlation and root-mean-squared error, can look impressive yet be hard for bench scientists to interpret. To bridge this gap, the authors re-expressed prediction quality in more intuitive ways, such as percentage error and error on a logarithmic scale that corresponds directly to fold-differences in IC50. Crucially, they also quantified how noisy real IC50 measurements are by mining a large bioactivity database. They showed that, under common assay conditions, 90% of repeated IC50 measurements for the same drug–cell pair fall within about a sevenfold range. Using this, they defined a new metric, Experimental Variability-Aware Prediction Accuracy (EVAPA): the percentage of model predictions that land within that experimentally realistic band.

Where the models shine and where they struggle

When the data were randomly split so that many drugs and cell lines appeared in both training and test sets, all deep learning models performed well. They showed strong correlations with measured IC50 values and high EVAPA scores, clearly beating the simple baseline. Performance stayed reasonably good when the models were asked to generalize to entirely new cell lines while still seeing familiar drugs; in this case, even the baseline did surprisingly well, suggesting that average drug behavior across many cell types already carries useful information. The real trouble came when the models faced new chemical structures: accuracy dropped sharply, correlations neared zero or even became negative, and in some tests the simple baseline matched or outperformed the deep models. The team also checked whether prediction errors depended on basic drug properties such as size, polarity, or flexibility, or on the tissue origin of the cell lines. They found only weak relationships, implying that the models work about equally well across diverse chemistries and cancer types—but still falter on truly novel compounds.

Trying truly new molecules from recent studies

To move beyond public databases, the authors assembled more than 150 recently reported anticancer compounds from the medicinal chemistry literature and tested several of the deep learning models on these unseen molecules. The results mirrored the “new drug” scenario in the GDSC data: predictions were noisy, with large percentage errors and only moderate fractions of predictions falling within realistic experimental bounds. Still, the behavior of the models across different assay types suggested that they captured some assay-independent patterns in how drugs affect cells. A simple web server built from these models now allows chemists to input a structure and obtain predicted IC50 values for hundreds of cancer cell lines, with the caveat that reliability is highest when the new molecule resembles those already in the training set.

What this means for drug discovery

This work shows that current deep learning tools are already useful for ranking and exploring cancer drug ideas when they operate within familiar chemical territory, but they are far from being crystal balls for truly new molecular designs. By highlighting that a crude average-based model can sometimes rival complex neural networks, and by introducing an accuracy measure grounded in real experimental variability, the study gives medicinal chemists a clearer sense of what to expect from IC50 prediction software. The message is balanced: these models are promising aids for drug discovery, especially when carefully benchmarked, but meaningful leaps in architecture and training—particularly for out-of-distribution molecules—are still needed before they can reliably guide the search for the next generation of cancer therapies.

Citation: Garai, U., Pal, A.S., Ghosh, K. et al. Benchmarking deep learning models for predicting anticancer drug potency (IC50) with insights for medicinal chemists. Commun Chem 9, 106 (2026). https://doi.org/10.1038/s42004-026-01916-9

Keywords: anticancer drug potency, IC50 prediction, deep learning models, cancer cell lines, computational drug discovery