Clear Sky Science · en

Thermodynamically consistent machine learning model for excess Gibbs energy

2026-04-14 · Back to index

Why this matters for everyday chemistry

Modern life relies on mixtures of liquids, from fuels and refrigerants to pharmaceuticals and green solvents. Designing these mixtures safely and efficiently depends on knowing how their molecules interact. Yet measuring those interactions for every possible combination is impossible. This article presents a new machine-learning tool, called HANNA, that learns the behavior of liquid mixtures directly from data while still respecting the basic laws of thermodynamics. It promises faster, broader, and more reliable predictions to guide chemical process design and materials discovery.

The hidden energy that shapes liquid mixtures

When different liquids are blended, their molecules attract or repel each other in ways that can be quite subtle. These effects are captured in a quantity called the “excess Gibbs energy,” which tells us how much the mixture deviates from ideal behavior. From this single function, engineers can derive key properties such as activity coefficients, which in turn determine whether a mixture forms one liquid phase or splits into two, whether vapor and liquid coexist, and how components distribute between them. Unfortunately, excess Gibbs energy cannot be measured directly. It must be inferred from painstaking experiments on vapor–liquid and liquid–liquid equilibria or heat effects, and only a tiny fraction of all relevant mixtures has ever been studied.

Limits of traditional prediction tools

For decades, engineers have relied on models like NRTL, UNIQUAC, and the UNIFAC family to estimate mixture behavior. These methods approximate interactions through parameters that are fitted to experimental data, often on a pairwise basis. While powerful, they have important limitations: to predict a new mixture, one usually needs parameters for every binary subsystem that appears within it, and these may not exist for novel compounds. Even group-based approaches like UNIFAC, which decompose molecules into building blocks, are confined to a fixed catalog of groups and can struggle with complex species such as ionic liquids. Moreover, many classical models find it difficult to describe both vapor–liquid and liquid–liquid equilibria accurately with a single parameter set.

A neural network that obeys physical laws

HANNA tackles these challenges by combining modern neural networks with hard-wired thermodynamic rules. As input, it needs only the molecular structures of the components (encoded as SMILES strings), the temperature, and the mixture composition. A chemical language model (ChemBERTa-2) first converts each molecule into a numerical fingerprint. These fingerprints feed into a specialized network architecture that is built to obey key consistency requirements: it respects the Gibbs–Duhem relation, behaves correctly when one component becomes pure or infinitely dilute, and gives the same answer no matter how the components are ordered. From these constraints, HANNA predicts the excess Gibbs energy for every binary pair in a mixture and then uses a geometric projection scheme to extend those predictions to mixtures with many components, without introducing extra fitting parameters.

Training on real data, not just equations

To make HANNA broadly useful, the authors trained it on an exceptionally large and diverse experimental database. This includes vapor–liquid data with full phase compositions, vapor–liquid data with only total pressures, liquid–liquid phase splits, activity coefficients at infinite dilution, and excess enthalpies, covering more than 800,000 data points and over 4,000 distinct compounds, including ionic liquids and other challenging species. A key innovation is a surrogate solver that emulates a robust thermodynamic algorithm for detecting and locating liquid–liquid splits. This surrogate is differentiable, so HANNA can be trained “end-to-end” against measured phase compositions without resorting to slow iterative calculations inside the learning loop. Additional loss terms encourage HANNA to recognize the curvature associated with phase separation and to produce smooth predictions that behave sensibly even beyond the training range.

How the new model measures up

Once trained, HANNA was tested only on systems that had been held back during training, and its performance was compared against leading classical and machine-learning models. For binary mixtures, it consistently predicted activity coefficients, phase compositions, and excess enthalpies more accurately than the widely used modified UNIFAC (Dortmund) method, while also identifying liquid–liquid miscibility gaps more reliably. For ternary and even quaternary mixtures, which it had never seen during training, HANNA remained competitive or superior, despite relying solely on binary data plus the geometric projection. It also outperformed several recent graph-based neural networks that either lacked strict thermodynamic consistency or were limited to special conditions such as room temperature or infinite dilution.

What this means for science and industry

To a non-specialist, the central message is that HANNA acts like a highly informed, physically grounded “oracle” for liquid mixtures. Given only the chemical formulas, it can predict whether two or more liquids will mix, split into layers, or form complex phase behavior, and it does so across a wide range of temperatures. Crucially, it does this while honoring the underlying thermodynamic rules, reducing the risk of unphysical results that can plague unconstrained machine-learning models. Because the full model and code are released openly and are accessible through a web interface, engineers can start using HANNA directly in process simulation and solvent screening. While the authors note remaining limitations—such as untested performance far outside the training temperature range and for strong electrolytes—the work marks a major step toward data-driven, thermodynamically consistent design of chemical processes.

Citation: Hoffmann, M., Specht, T., Göttl, Q. et al. Thermodynamically consistent machine learning model for excess Gibbs energy. Nat Commun 17, 3485 (2026). https://doi.org/10.1038/s41467-026-71430-y

Keywords: liquid mixtures, thermodynamics, machine learning, excess Gibbs energy, phase equilibria