Clear Sky Science · en
FePTP: A text-mined dataset of transformation pathways among iron-containing phases
Why iron’s hidden journeys matter
Iron quietly shapes our planet, from the strength of steel to the way soils lock away carbon. Yet the many ways iron minerals change form deep underground, in sediments, or inside industrial equipment are scattered across thousands of research papers. This study brings those hidden stories together by using artificial intelligence to read the literature and assemble a large, searchable map of how iron-containing materials transform under different conditions. That map can help geologists, environmental scientists, and engineers better understand how iron behaves in nature and in technology.
Iron’s many faces in nature and technology
Iron is both abundant and restless. In Earth’s crust and oceans, as well as in ores and steel, it appears in many mineral forms that can switch from one to another when temperature, water, oxygen, or microbes change. These transformations influence how ore deposits form, how soils hold or release organic carbon, and how steel gains its strength. For example, the shift between austenite and ferrite controls the properties of steel, while the conversion of poorly ordered ferrihydrite into more stable minerals affects how much carbon sediments can store. Understanding these shifts across all the different environments where iron appears requires pulling together a great deal of scattered experimental evidence.

Turning scattered reports into one shared resource
The authors created FePTP, the first text-mined dataset dedicated to transformation pathways among iron-containing phases. Instead of running new experiments, they built a pipeline that automatically searches existing articles, downloads full text, and converts it into a machine-readable form. The system then filters for papers that genuinely discuss phase changes in iron minerals, rather than just mentioning iron in passing. From each selected paper, it extracts pathways that describe how a “precursor” phase turns into a “product” phase, along with the conditions, such as temperature, pH, pressure, or presence of other chemicals. Each record also notes whether a change truly occurred and includes reaction equations when available.
How artificial intelligence learns iron’s story
To tackle the varied language scientists use, the pipeline combines large language models with smaller, specialized models. A glossary of over a thousand iron-bearing phases helps the system recognize minerals even when authors use nicknames, abbreviations, or sample codes. The pipeline works in stages: it first scans article abstracts to sketch possible transformation pathways, then revisits the full text and tables to fill in details like exact temperatures, times, and solution chemistry. Afterwards, additional models and rule-based checks clean the results, correct errors using passages retrieved from the original papers, and discard vague or inconsistent pathways. This careful curation turns messy text into a consistent structure that computers and humans can both navigate.

What the dataset contains
The final FePTP dataset holds 11,241 transformation pathways drawn from 4,245 papers, covering more than 730 different iron-containing phases. It includes both cases where a mineral clearly changes and cases where no change was observed under certain conditions, which are just as informative for understanding what keeps a phase stable. Each pathway lists the starting and ending phases, the likely driving process (such as heating in solids, dissolving and re-precipitating, melting, or microbial action), as well as step-by-step operations like heating, aging, mixing, or adding reagents. Conditions are standardized into common units, and chemical names are linked to unique digital identifiers, making it easier to compare studies and run large-scale analyses.
How reliable and useful is the map
Human experts checked a sample of the automatically extracted pathways and found that most of the detailed entries, such as temperatures, solvents, and reactants, were accurate. Around seven in ten complete pathways were judged correct or only slightly off, while the rest contained larger errors, missing evidence, or redundant information. The authors note that the pipeline still misses some subtle or implicit transformations and cannot yet read complex scientific figures, where many key details reside. Even so, FePTP already offers a rich, structured view of iron’s behavior across laboratory and natural settings, which can support new models of geochemical cycling, help design ways to control phase transformations, and guide future improvements in AI tools for mining knowledge from scientific literature.
What this means for readers
For a non-specialist, the main message is that scientists have taught computers to comb through thousands of papers and stitch together a coherent picture of how iron minerals change form. Instead of inventing a new theory from scratch, this work organizes what is already known into a single, open database that others can explore. This shared resource should make it easier to predict when iron will lock up carbon or release it, how ore bodies formed over Earth’s history, and how industrial processes might better harness or avoid certain transformations. FePTP is less a final answer and more a powerful map, pointing researchers toward patterns and pathways that were previously buried in text.
Citation: Lin, L., Ren, C., Xiao, Y. et al. FePTP: A text-mined dataset of transformation pathways among iron-containing phases. Sci Data 13, 752 (2026). https://doi.org/10.1038/s41597-026-07067-9
Keywords: iron mineral transformations, text mining, geochemical cycling, materials data, large language models