Clear Sky Science · en

Deciphering DEL pocket patterns through contrastive learning

· Back to index

Why looking at protein "pockets" can speed up new medicines

Modern drug hunters can now test trillions of tiny molecules at once using DNA-encoded libraries, or DELs. Yet only a handful of these DEL-derived molecules have become real medicines. One big missing piece is knowing which proteins in the body actually have the right kinds of nooks and crannies—"pockets"—for DEL molecules to latch onto. This study tackles that gap by mapping what successful DEL pockets look like and building an artificial intelligence model, called ErePOC, to find similar pockets across the human body.

How DEL technology searches for new drug molecules

DELs work a bit like barcoded fishing lures. Chemists attach small candidate molecules to short pieces of DNA that act as ID tags, then expose vast mixtures of these tagged molecules to a protein of interest. Molecules that stick are read out by sequencing the DNA. This approach is fast and inexpensive, but turning DEL hits into real drugs is still difficult. One reason is that DEL molecules share certain chemical constraints, such as how they are made in water and how the DNA tag is attached. These constraints mean they tend to prefer particular types of protein pockets, but until now, those preferences had not been mapped in a systematic way.

Figure 1
Figure 1.

What makes a pocket attractive to DEL molecules

The authors first compared thousands of protein pockets that bind different types of ligands: ordinary biological small molecules, FDA-approved drugs, and DEL hits. They found that DEL and drug pockets tend to be larger and more chemically complex than pockets for natural ligands. In particular, DEL pockets are more open and hydrophobic—meaning they favor oily, water-repelling interactions—while still keeping a small but important set of polar contact points that fine‑tune binding. Certain bulky amino acids that provide aromatic and hydrophobic surfaces, such as tyrosine and phenylalanine, show up more often in DEL and drug-binding pockets than in typical protein surfaces. Overall, DEL pockets look more like classic drug-target pockets than like ordinary metabolic sites, but with an extra bias toward large, hydrophobic cavities.

Teaching an AI model to recognize pocket "personalities"

To go beyond simple size and chemistry counts, the team built ErePOC, a representation model that treats each binding pocket as a kind of fingerprint. It starts from protein language model embeddings, which capture patterns learned from millions of sequences, and compresses the information about the residues that form a pocket into a compact numerical vector. Using contrastive learning, ErePOC is trained so that pockets binding chemically similar ligands end up close together in this abstract space, and those binding very different molecules drift apart. When the authors visualized this space, pockets known to bind the same cofactors, like ATP or heme, formed well separated clusters, showing that the model had learned to group pockets by functional behavior rather than just by overall protein shape.

Finding DEL-friendly targets across the human proteome

With ErePOC trained, the researchers projected known DEL pockets, drug pockets, and hundreds of thousands of pockets from experimental and predicted protein structures into the same landscape. DEL pockets scattered widely, indicating that DEL screening can in principle reach much of the traditional "druggable" space, but they still showed clear preferences for certain regions associated with larger, hydrophobic pockets. The team then scanned more than 23,000 AlphaFold-predicted human proteins, filtering for well-defined pockets and asking which ones most closely resembled known DEL pockets in ErePOC space. They identified nearly 2,800 human proteins with pockets highly similar to successful DEL sites, with strong enrichment in families such as transferases, hydrolases, oxidoreductases, chromatin regulators, and some RNA-binding proteins. Follow-up computer docking with a large virtual DEL suggested that these ErePOC-flagged pockets indeed tend to bind DEL-like molecules more favorably.

Figure 2
Figure 2.

Why this matters for future drug discovery

For non-specialists, the key takeaway is that the success of ultra‑large chemical libraries depends as much on choosing the right protein pockets as on the molecules themselves. This work shows that DEL hits tend to come from pockets that are big, flexible, and hydrophobic, and introduces a powerful AI tool for recognizing such pockets from sequence or structure alone. By using ErePOC to focus DEL screening on proteins whose pockets already look DEL-compatible, drug hunters can prioritize more promising targets, reduce wasted screening effort, and potentially expand into less explored classes such as chromatin and RNA-binding proteins. In short, the study offers both a clearer picture of what a "DEL-ready" pocket looks like and a practical map and compass for finding many more of them across the human proteome.

Citation: Zhang, W., Wang, Y., Zhan, R. et al. Deciphering DEL pocket patterns through contrastive learning. Nat Commun 17, 2810 (2026). https://doi.org/10.1038/s41467-026-69663-y

Keywords: DNA-encoded libraries, protein binding pockets, contrastive learning, drug discovery AI, ErePOC