Clear Sky Science · en

Automated quantification of tumor-infiltrating lymphocytes by machine learning reveals prognostic and immunogenomic features in lung cancer

· Back to index

Why counting tiny immune cells in lung tumors matters

Lung cancer is still one of the deadliest cancers, but not all tumors behave the same way. Some are heavily patrolled by immune cells that slip inside the tumor, while others remain almost untouched. These tumor-infiltrating lymphocytes, or TILs, can hint at how a patient will fare and whether they might benefit from modern immunotherapy drugs. The challenge is that today, TILs are usually counted by eye under a microscope, which is slow and subjective. This study asks a timely question: can we use machine learning to automatically measure these cells on routine pathology slides, and what does that reveal about lung cancer biology and patient survival?

Figure 1
Figure 1.

Turning ordinary slides into digital maps

The researchers focused on lung adenocarcinoma, a common type of lung cancer, using public data from The Cancer Genome Atlas along with an independent set of patients from their own hospital. For each patient, they analyzed standard hematoxylin and eosin (H&E)–stained tissue slides, the pink-and-purple images every pathologist knows well. With open-source QuPath software, they built a stepwise pipeline: first, they corrected color differences between slides; next, they used a watershed algorithm to separate overlapping cell nuclei; finally, a trained computer classifier labeled each detected cell as tumor, supporting tissue (stroma), or lymphocyte. Two expert pathologists repeatedly reviewed and corrected the machine’s work until it reliably recognized the different cell types on its own.

Linking immune cell counts to patient outcomes

Once the system could confidently identify cells, the team calculated how many lymphocytes were present per square millimeter of tumor tissue for more than 300 patients. They found that TIL levels varied widely, and on average made up only a small fraction of all cells. Using a statistical approach to find the most informative cut-off, they chose 135 TILs per square millimeter as the dividing line between “high” and “low” TIL tumors. Patients whose tumors crossed this threshold lived longer than those with sparse immune cell infiltration, and this pattern held up in both the original and validation groups. In other words, a simple number produced by an automated tool captured meaningful differences in survival, echoing earlier, more labor‑intensive studies that relied on manual counting.

Figure 2
Figure 2.

What immune-rich tumors look like under the hood

Because genetic and molecular data were available for many of these tumors, the authors could explore what distinguished high‑TIL from low‑TIL cancers beyond simple cell counts. Tumors teeming with lymphocytes showed stronger signatures of immune activity: genes involved in recognizing abnormal proteins, presenting them to T cells, and coordinating immune attack were all more active. These tumors also carried a broader mix of DNA mutations, which can create novel targets for the immune system. By contrast, low‑TIL tumors favored genes linked to building ribosomes and making proteins, a sign of heavy growth machinery but relatively quiet immune engagement. This split mirrors the now familiar contrast between “hot” tumors, rich in immune cells and more likely to respond to immunotherapy, and “cold” tumors, which the immune system mostly ignores.

Teaching a computer to predict immune status

The team went a step further and asked whether a compact set of image features could predict whether a tumor would fall into the high‑ or low‑TIL category without explicitly counting every lymphocyte. They summarized subtle textural patterns in the slides—how pixel intensities change across small neighborhoods—into so‑called Haralick features, and combined these with the tumor’s clinical stage in a random forest model. In cross‑validation, this classifier correctly separated high‑ from low‑TIL tumors with strong accuracy, and it retained reasonable performance in an independent hospital cohort. Importantly, the entire approach runs on standard computers using freely available software, suggesting that many pathology labs could, in principle, adopt it without specialized hardware.

What this means for future lung cancer care

For a non‑specialist, the key message is that a computer can learn to read routine lung cancer slides in a way that captures how strongly the immune system has engaged the tumor. High levels of infiltrating lymphocytes signal a more active immune battle, a richer landscape of mutations, and better overall survival. Although more work is needed—especially in patients actually treated with immunotherapy—this automated method could eventually help doctors sort tumors into immune “hot” and “cold” categories quickly and consistently. That, in turn, might guide decisions about who is most likely to benefit from immune‑based treatments and spark new strategies to turn cold tumors hot.

Citation: Li, A., Pang, Y., Zhang, H. et al. Automated quantification of tumor-infiltrating lymphocytes by machine learning reveals prognostic and immunogenomic features in lung cancer. Sci Rep 16, 7006 (2026). https://doi.org/10.1038/s41598-026-37076-y

Keywords: lung adenocarcinoma, tumor-infiltrating lymphocytes, machine learning, digital pathology, cancer immunotherapy