Clear Sky Science · en

Data-efficient machine-learning of complex Fe–Mo intermetallics using domain knowledge of chemistry and crystallography

· Back to index

Why strange crystal patterns matter

Modern jet engines, lightweight steels, and even some hydrogen storage materials depend on metallic alloys whose atoms arrange themselves in intricate 3D patterns. Among the most puzzling of these are so‑called topologically close‑packed (TCP) phases—highly ordered but complex crystal structures that can either strengthen an alloy or make it brittle. Calculating which of these patterns will form in a given alloy is so demanding that even powerful quantum‑mechanical methods struggle. This study shows how carefully designed machine‑learning models, infused with expert knowledge of chemistry and crystallography, can reliably predict the stability of especially complex TCP phases in an iron–molybdenum (Fe–Mo) alloy using surprisingly little data.

Figure 1
Figure 1.

Metal atoms in intricate 3D frameworks

In many alloys, atoms do not just line up in simple cubes; instead they form elaborate frameworks built from polyhedral cages, each atom surrounded by 12 to 16 neighbors. These TCP phases are important because they often appear as precipitates—tiny particles that can dramatically change strength, creep resistance, or corrosion behavior. In Fe–Mo and related systems, simpler TCP phases such as A15, Laves phases, and the σ and μ phases are already known and can be handled with conventional quantum calculations. But more complex relatives, labeled R, M, P, and δ, contain many more distinct atomic sites per crystal unit. To fully explore all ways of placing Fe and Mo atoms on these sites would require an astronomical number of expensive simulations, well beyond the reach of standard computational tools.

Teaching machines using expert hints

The authors tackled this bottleneck by training machine‑learning models on fewer than 300 quantum‑mechanical (DFT) calculations for the simpler TCP phases in Fe–Mo. Instead of feeding the algorithms only raw composition information (how much Fe and Mo are present), they built rich descriptors that embed domain knowledge. These descriptors encode atomic properties such as valence electrons and atomic volume, the local geometric environment around each atom, and how atoms occupy lattice sites with specific coordination numbers. By averaging these local fingerprints over groups of sites that share the same local geometry, the models “see” not just which elements are present, but how they sit within the 3D framework.

From local neighborhoods to reliable energy maps

To capture the subtle differences between competing crystal arrangements, the team borrowed advanced descriptors from several families of interatomic models. SOAP and ACE descriptors summarize the shapes and arrangements of neighboring atoms, while bond‑order‑based descriptors summarize features of the electronic structure, such as how wide the band of allowed electron energies is. These per‑atom fingerprints are combined and averaged in a way that respects the crystal’s internal architecture. The authors then trained relatively simple regression models—kernel ridge regression, small neural networks, and random forests—while systematically testing which features actually improved predictions. As more crystallographic and chemical knowledge was built into the descriptors, the prediction error dropped dramatically, ultimately reaching around 20 millielectronvolts per atom, a level comparable to the accuracy of the underlying DFT data.

Figure 2
Figure 2.

Revealing hidden phases in iron–molybdenum

Armed with these compact yet powerful models, the researchers scanned through all possible Fe/Mo arrangements on the many atomic sites of the four complex TCP phases R, M, P, and δ. They mapped out the formation energies for tens of thousands of configurations to determine which ones lie on the “convex hull,” the set of lowest‑energy states that define thermodynamic stability. The models predict that the R phase in Fe–Mo can achieve negative formation energies—meaning it is intrinsically stable—near compositions where experiments indeed observe it at high temperature. The M phase appears as a close competitor, while the P and δ phases remain consistently less favorable and are unlikely to form in this alloy. Additional DFT checks on selected configurations confirmed the machine‑learning predictions, especially for the R and M phases.

Experiment closes the loop

To test the predictive power of their approach more stringently, the authors compared model results to new synchrotron X‑ray diffraction data for an Fe–Mo sample containing the R phase. Their model, combined with a standard thermodynamic approximation, predicts how frequently each crystallographic site should be occupied by Fe or Mo at high temperature. These predicted site occupancies match the refined experimental values remarkably well and follow classic “Kasper rules,” which state that larger atoms preferentially occupy sites with more neighbors. This agreement shows that the model not only gets overall energies right but also captures delicate differences between nearly equivalent atomic arrangements.

What this means for future materials design

By embedding chemical and structural insight directly into machine‑learning descriptors, this work delivers accurate predictions for extremely complex crystal structures using only a modest training set. For alloy designers, it opens the door to routinely exploring TCP phases that were previously too intricate for brute‑force quantum calculations, helping to identify which phases will strengthen or weaken a material and under what conditions they appear. More broadly, the study illustrates that data‑efficient, trustworthy machine learning in materials science is possible when models are guided by the same physical reasoning human experts use, rather than relying on raw data alone.

Citation: Forti, M., Malakhova, A., Lysogorskiy, Y. et al. Data-efficient machine-learning of complex Fe–Mo intermetallics using domain knowledge of chemistry and crystallography. npj Comput Mater 12, 161 (2026). https://doi.org/10.1038/s41524-026-02070-5

Keywords: machine learning in materials science, intermetallic phases, iron molybdenum alloys, crystal structure prediction, topologically close packed phases