Clear Sky Science · en

De novo covalent drug generation with enhanced drug-likeness and safety

· Back to index

Why Smarter Covalent Drugs Matter

Cancer therapies and antiviral drugs often work by latching onto proteins inside our cells. A powerful class of medicines, called covalent drugs, goes a step further: they form tight, long‑lasting bonds with their targets, which can make them very effective. But that same stickiness can also cause serious side effects if the drugs bind to the wrong proteins. This study introduces CovaGEN, a new artificial‑intelligence system designed to invent covalent drug candidates from scratch while keeping both usefulness and safety in mind.

Figure 1
Figure 1.

From Trial‑and‑Error to Smart Design

Traditional covalent drug discovery is slow and expensive. Chemists either start with an existing drug and bolt on a reactive “handle,” or they begin with a reactive fragment and laboriously tune the rest of the molecule. Computer‑aided methods help by screening large libraries of known compounds, but they are limited to what already exists. Recent advances in deep learning can generate entirely new molecules, yet these tools usually focus on conventional, non‑covalent drugs and often ignore crucial traits such as how easy a compound is to make, whether it behaves like a real medicine, or how toxic it might be. CovaGEN aims to tackle all of these issues at once.

A Hidden Map of Drug‑Like Chemistry

The authors first built a “map” of chemical space using more than a million drug‑like molecules from the public ZINC database. They trained a type of neural network called a variational autoencoder to compress each molecule into a compact numerical code, or latent vector, and then reconstruct it. This step teaches the model the basic grammar of medicinal chemistry: what looks realistic, what respects common rules for oral drugs, and what tends to be synthetically feasible. On this learned map, a second model based on diffusion—a process that gradually turns structure into noise and then learns to reverse that process—was trained to wander and generate new latent vectors that decode into valid, diverse, and realistic molecules.

Tuning Molecules to Proteins and Sticky Hooks

Next, CovaGEN learns to design molecules for specific protein targets. The system takes the amino‑acid sequence of a protein and encodes it with a powerful language model for proteins, which captures patterns related to structure and function. During diffusion, the model pays cross‑attention to this protein representation so that the molecules it generates are predisposed to fit the target’s binding site. In tests on a large benchmark set, these tailored molecules not only scored well for predicted binding strength but also showed better drug‑likeness and synthetic accessibility than those from several state‑of‑the‑art 3D design tools. To turn these non‑covalent binders into covalent candidates, the team added a classifier that nudges the diffusion process toward molecules bearing certain reactive groups, known as covalent warheads, without sacrificing overall quality.

Figure 2
Figure 2.

Building in Safety from the Start

Because covalent drugs can cause lasting damage if they hit the wrong targets, safety is a central concern. The researchers therefore treated toxicity as a quantity to be optimized, not an afterthought. They trained predictive models of acute toxicity and organ‑specific risks, then applied reinforcement learning to fine‑tune the diffusion model so that it prefers molecules with lower predicted toxicity. After this training, CovaGEN generated compounds that retained good binding and drug‑like features but were less likely to contain structural “red flags” associated with harmful effects. Importantly, the method achieved this without collapsing into a narrow set of similar molecules, preserving chemical diversity.

Putting the Method to the Test

To demonstrate real‑world potential, the team asked CovaGEN to design covalent inhibitors for two high‑value targets: a mutant form of the EGFR protein involved in drug‑resistant lung cancer, and the main protease of SARS‑CoV‑2, the virus behind COVID‑19. For each protein, the system generated many candidate molecules equipped with the appropriate warheads. Computer docking studies suggested that CovaGEN’s designs were more likely to adopt poses capable of forming covalent bonds than compounds where warheads were simply bolted onto non‑covalent designs. Several top candidates also matched or exceeded reference drugs on combined measures of predicted binding, drug‑likeness, and synthetic ease.

What This Means for Future Medicines

CovaGEN shows that modern generative AI can do more than just imagine molecules that stick tightly to proteins. By working in a space of inherently drug‑like structures, guiding the addition of reactive groups, and explicitly steering away from toxicity, the system moves covalent drug discovery toward a more holistic and automated process. While further laboratory testing is needed to confirm how these virtual molecules behave in living systems, the approach opens a path to faster, broader exploration of covalent medicines that are not only potent but also safer for patients.

Citation: Zhang, W., Liu, T., Dong, X. et al. De novo covalent drug generation with enhanced drug-likeness and safety. Commun Biol 9, 446 (2026). https://doi.org/10.1038/s42003-026-09725-5

Keywords: covalent drugs, generative AI, drug discovery, diffusion models, toxicity prediction