Clear Sky Science · en
polyRETRO: a language model approach to predict polymerization class and monomers for a target polymer
Turning digital plastic dreams into real materials
Designing new plastics on a computer is now fast and routine, but actually making those materials in the lab still takes a lot of human guesswork. This paper introduces a tool called polyRETRO that helps chemists figure out how to build a desired polymer from simple starting molecules, potentially speeding up the journey from digital idea to real-world product.
Why making new plastics is still hard
Modern algorithms can suggest polymer structures with desirable properties for electronics, packaging, or medicine. Yet most of these designs never leave the screen because chemists must manually work out how to synthesize them. That means deciding which small molecules to buy or make, which type of reaction to use, and how those pieces fit together into long chains. For everyday small molecules, computer programs already offer this kind of “recipe planning,” but polymers are larger, more complex, and lack the rich reaction databases needed for automated planning.
A language model that speaks chemistry
The authors tackle this gap by teaching large language models, the same kind of AI that powers chatbots, to reason about polymer chemistry. Their system, polyRETRO, starts from a compact text code for a polymer repeat unit called a SMILES string. From this alone, the AI first predicts which broad style of reaction most likely produced the polymer: simple chain growth, stepwise condensation, or ring-opening processes. It then moves on to infer, in plain chemical language, how functional groups changed during the reaction and which monomer molecules must have been present.
Templates that bridge words and molecules
To make this possible, the team assembled more than 11,000 documented polymerization routes and distilled them into reaction “templates.” Each template describes, in human-readable terms, how certain functional groups on monomers combine to form a bond in the polymer chain, such as turning an alcohol and an acid into an ester link. Instead of comparing detailed atom-by-atom patterns, the language model learns to map from the polymer’s SMILES code directly to one of these templates. This approach keeps the chemical logic interpretable while allowing the AI to generalize across many different structures.
From polymer backbones back to building blocks
Once a template is chosen, polyRETRO effectively runs the reaction backward. It imagines the repeat unit as part of a ring, then “cuts” the specific bond that would have formed during polymerization. The resulting fragments, called synthons, are then completed into realistic monomer molecules according to the template’s rules. For polymers made from opening small rings, this step is even simpler: the model just recloses the chain segment into its original ring-shaped monomer.
How well does the system work
Across thousands of test cases, the fine-tuned GPT model correctly identified the reaction class about 98 percent of the time and chose the right reaction template more than 90 percent of the time for both major polymerization families studied. When the full pipeline was tested on unseen polymers, including the final monomer prediction step, it recovered the correct starting monomers in roughly 88 percent of cases. Many of the remaining cases were nearly right, differing only in small end groups that would still be practical in a lab setting.
What this means for future materials
To a non-specialist, polyRETRO can be viewed as a translator that takes a desired plastic structure and suggests plausible ingredient lists and assembly steps. While the current work does not yet recommend catalysts, solvents, or temperatures, it offers chemists a clear, interpretable starting point for planning syntheses. As the approach is expanded to more complex polymers and richer reaction conditions, it could help turn the growing flood of AI-designed materials into substances that can actually be made, tested, and used in everyday technologies.
Citation: Agarwal, S., Xiong, W. & Ramprasad, R. polyRETRO: a language model approach to predict polymerization class and monomers for a target polymer. npj Artif. Intell. 2, 52 (2026). https://doi.org/10.1038/s44387-026-00113-2
Keywords: polymer retrosynthesis, large language models, polymer design, monomer prediction, polymer informatics