Clear Sky Science · en
High Entropy Alloys Database generated with Large Language Model
Why New Metals Matter for Everyday Life
From electric cars and wind turbines to smartphones and solar panels, modern technologies depend on a small set of critical metals. Many of these elements are scarce, expensive, or tied to fragile supply chains. Scientists are therefore racing to invent new kinds of metal mixtures that can replace them. This paper presents a large, openly available database of such materials—called high entropy alloys—built not by hand, but by using advanced language models to read and summarize thousands of research papers automatically.

What Makes These Alloys So Special
Traditional alloys, like steel or bronze, usually mix one main metal with small amounts of others. High entropy alloys take a different approach: they blend several elements in roughly equal amounts, creating an enormous design space of possible combinations. Some of these mixtures have remarkable properties, such as exceptional strength, resistance to wear, or useful catalytic behavior for chemical reactions. But with tens of thousands of potential recipes, it would be impossible for scientists to test them all in the lab using trial and error.
Letting Machines Read the Literature
To navigate this vast space, the authors turned to large language models—the same type of artificial intelligence that can summarize articles or answer questions. They gathered 4,625 full-text scientific papers on high entropy alloys from major publishers, converting structured files and even complex PDF layouts into machine-readable text. The language model was then guided through carefully designed question sequences, asking it to identify each alloy, its chemical makeup, how it was made, what crystal structure it formed, and whether the study was theoretical or experimental.
Turning Free-Form Text into Structured Knowledge
The team relied on a step-by-step prompting strategy, where each question to the model built on its previous answers, maintaining a consistent understanding of each paper. The model’s replies were forced into a semi-structured format similar to nested lists or tables, which later could be parsed into a clean database. In the end, the system distilled information on 12,427 different high entropy alloys, each linked back to the original paper by its digital identifier for easy checking and reuse. Alongside the alloys themselves, the database captures key details such as the number of phases present, the type of crystal lattice, and the methods used to synthesize or simulate the materials.

What the Database Reveals About Current Research
Because the database is so large, it offers an overview of how the field has evolved. Many of the recorded alloys form just one dominant crystal type, most often in familiar patterns known to give a good balance between strength and toughness. The records also show that classic melting techniques, such as arc melting, remain the workhorses for producing these materials, while powder-based and laser-based methods are growing but still secondary. A separate analysis of which elements appear most often highlights a heavy focus on certain transition metals and compositions related to a widely studied reference alloy, indicating where researchers have concentrated their efforts so far.
Checking the Machine’s Work
To judge how trustworthy this automated extraction is, experts manually evaluated a sample of 50 papers. They compared their own carefully curated answers with what the language model and the follow-up processing scripts produced. The results showed high accuracy in basic tasks such as getting the alloy composition and study type right, with correctness above 90% in some cases. More complex tasks, like splitting subtle phase information into database-ready rows, were more error-prone, with accuracy dropping into the high 70% range after formatting. This analysis allowed the authors to pinpoint where the pipeline works reliably and where future improvements are needed.
Why This Resource Matters
For non-specialists, the key outcome is that there is now an openly accessible, large-scale map of today’s high entropy alloy research, created automatically from the scientific literature. This database does not by itself discover the perfect new material, but it gives researchers and engineers a powerful starting point: a structured, searchable foundation on which advanced data analysis and machine learning can build. By greatly reducing the time needed to gather and standardize past results, the work accelerates the search for new metal mixtures that could support cleaner energy, more durable devices, and a more resilient supply of critical materials.
Citation: Chizhevskiy, V., Marković, G., Benrazzouq, Se. et al. High Entropy Alloys Database generated with Large Language Model. Sci Data 13, 612 (2026). https://doi.org/10.1038/s41597-026-06930-z
Keywords: high entropy alloys, materials database, large language models, materials discovery, automated text mining