Clear Sky Science · en

MAMMAL - Molecular Aligned Multi-Modal Architecture and Language for biomedical discovery

2026-05-04 · Back to index

Why Smarter Drug Discovery Matters

Finding new medicines is slow, risky, and extremely expensive. Most drug candidates still fail in clinical trials, often after years of work. At the same time, biology labs now produce oceans of data about genes, proteins, cells, and chemicals. This article introduces MAMMAL, a new kind of artificial intelligence system that learns from all of these data types at once. By connecting patterns across molecules, cells, and drugs, it aims to help scientists pick better targets, design better medicines, and avoid costly dead ends earlier in the process.

One Brain for Many Kinds of Biological Data

Today’s AI tools in biomedicine are often specialists: one model handles protein sequences, another handles small molecules, and yet another looks only at gene activity. MAMMAL takes a different route. It treats proteins, antibodies, small-molecule drugs, and gene expression profiles as different kinds of "sentences" that can all be read by the same model. To do this, the researchers built a flexible way of turning each data type into a shared sequence format, and they trained a large transformer-based network—similar in spirit to modern language models—on about two billion examples drawn from public protein, antibody, chemical, and cell-level datasets.

Learning the Language of Drugs and Cells

MAMMAL is designed to both understand and generate biological information. It can classify, rank, or predict numbers such as binding strength or drug potency, and it can also invent new sequences, for example suggesting new antibody fragments. A key feature is that it does not just see symbols; it can also take in and produce numerical values directly, such as measurements from lab assays. This helps it reason about how strongly a drug binds to a protein or how a cancer cell responds to treatment. All of these tasks are framed as variations of one core activity: turning one sequence into another, much like translating between languages.

Testing the Model Across the Drug Pipeline

To see whether this unified approach truly helps, the authors tested fine-tuned versions of MAMMAL on eleven different benchmarks that mimic real steps in drug discovery. These included recognizing cell types from single-cell gene expression data, predicting whether small molecules can cross the blood–brain barrier or cause toxic side effects, estimating how cancer cells respond to various drugs, and forecasting how strongly proteins will bind to each other or to small-molecule drugs. MAMMAL reached or surpassed the best reported performance in nine out of eleven tests, often beating highly specialized models that were tuned for just one data type.

Designing Antibodies and Beating Structure Models at Their Own Game

Some of the most striking results came from protein-based tasks. In an antibody “infilling” challenge—where the goal is to fill in the most variable segments that actually contact a target—MAMMAL recovered the correct amino acids far more often than earlier methods, especially in the notoriously hard central region of the antibody’s binding site. The team also asked whether MAMMAL could tell binding and non-binding antibodies apart and compared it to AlphaFold 3, a structure-prediction tool whose confidence scores can be used as an indirect guess at binding. On five of seven test targets, including large and flexible proteins relevant to cancer, MAMMAL’s binding predictions were clearly more accurate, even though it only saw sequences and not 3D structures.

Hints of Real-World Impact

Beyond benchmarks, the researchers checked whether the model’s predictions match lab reality. They examined four cancer drugs, including Carfilzomib, which is approved mainly for blood cancers. MAMMAL correctly predicted the relative strength of these drugs across hundreds of cell lines, and this ranking was confirmed in focused experiments. The finding hints that the drug might have broader use in solid tumors than currently appreciated, a possibility that now warrants further testing. The model has also shown promise in collaborations aiming to predict antibody activity against flu viruses and other targets.

What This Means for Future Medicines

In plain terms, MAMMAL acts like a multilingual reader and writer for biology, able to connect what happens at the level of genes, proteins, and chemicals within a single framework. Its strong performance across many tasks suggests that such unified models can become core components of AI-assisted "virtual cells" that help scientists explore treatments in silico before entering the lab. While it does not replace experiments—and still needs careful validation—it can narrow the search space, highlight surprising possibilities, and make the long road from idea to approved drug a bit faster and more efficient.

Citation: Shoshan, Y., Raboh, M., Ozery-Flato, M. et al. MAMMAL - Molecular Aligned Multi-Modal Architecture and Language for biomedical discovery. npj Drug Discov. 3, 14 (2026). https://doi.org/10.1038/s44386-026-00047-4

Keywords: AI-driven drug discovery, multimodal biomedical models, antibody design, protein–drug interactions, gene expression profiling