Clear Sky Science · en
Spectral Quantum Chemistry and Infrared Resonance Library for Data-Driven Molecular Spectroscopy
Why invisible light matters
Every object around us, from a headache pill to a plastic bottle, is made of molecules that quietly vibrate. These tiny vibrations interact with infrared light, creating unique “fingerprints” that chemists use to tell substances apart. Infrared spectroscopy, the technique that reads these fingerprints, underpins drug quality checks, pollution monitoring, and materials design. Yet until now, scientists have lacked a large, clean, and open digital library of such fingerprints to train modern AI tools. This article introduces SQuIRL, a new computational database that fills that gap and could change how we design and recognize molecules using data.

A digital fingerprint library for molecules
The heart of this work is SQuIRL, the Spectral Quantum Chemistry and Infrared Resonance Library. Instead of relying on time-consuming lab measurements, the authors used high-level quantum calculations to predict how 133,885 small organic molecules respond to infrared light. For each molecule, SQuIRL stores the positions and strengths of all infrared peaks—the essential ingredients of an infrared spectrum. These molecules come from a well-known chemistry collection called QM9, which already contains detailed structural and electronic information. By adding vibrational fingerprints on top, SQuIRL turns QM9 into a richer playground for data-driven chemistry.
Why existing collections fall short
Over the years, several experimental collections have gathered thousands of infrared spectra, including well-known databases from NIST, SDBS, and commercial vendors. While invaluable, these resources have limits: they tend to cover only common, easy-to-handle molecules, they mix different measurement conditions, and they are often locked behind paywalls or awkward web interfaces that make large-scale analysis difficult. Newer computational datasets and AI-generated libraries go further in size, but they trade off accuracy, openness, or uniformity. SQuIRL is designed to sit at the sweet spot: fully open, large enough for modern machine learning, and computed at a consistently high level of theoretical accuracy.
How the spectra are created
To build SQuIRL, the team ran all calculations with a carefully chosen recipe known in the field for its balanced precision. Each molecule’s shape was taken from QM9 and then analyzed with a quantum mechanical method that captures how electrons move and how atoms vibrate together. From this, the authors extracted the frequencies and intensities of every vibrational mode—the raw building blocks of an infrared spectrum. They intentionally kept these data unprocessed, so users can later shape them into smooth curves or apply corrections as needed. Alongside the spectra, SQuIRL stores a wealth of extra information: how charge is distributed, how easily the molecule’s electrons can be distorted, basic thermodynamic quantities, and even standard line drawings of the structures, all organized in a machine-friendly HDF5 file with a companion index for quick filtering.
Checking accuracy and chemical variety
Accuracy and diversity are crucial if machines are to learn from such a library. The authors benchmarked a set of familiar small molecules—like ammonia, ethanol, and formaldehyde—comparing SQuIRL’s predicted spectra to both top-tier quantum methods and trusted experimental measurements. The differences in peak positions were typically only a few tens of units on the infrared scale, well within the range accepted for high-quality computational work. Just as important, SQuIRL spans a wide range of chemical “flavors”: common groups such as alcohols and ethers appear alongside less frequent but scientifically important ones like nitro groups and guanidines. Most molecules contain multiple distinct functional features and bonding patterns, and statistical checks show that even within a single class, the structures are not mere repeats of one another. This structural and electrical variety helps avoid bias and makes the dataset especially suitable for training robust AI models.

A foundation for AI-guided discovery
Seen through the lens of a non-specialist, SQuIRL is like a high-resolution atlas of how small molecules “sound” when probed with invisible infrared light. Because it is large, accurate, and openly available, this atlas can feed new generations of algorithms that read or even design molecules based on their spectral fingerprints—much as speech recognition systems learn from vast archives of recorded voices. By standardizing how the data are stored and by documenting them carefully, the authors make it easy for researchers in academia and industry to plug SQuIRL into their own pipelines. In practical terms, this resource could accelerate tasks ranging from automated structure identification to the guided search for new drugs and materials, bringing a data-driven approach to one of chemistry’s most established experimental tools.
Citation: Krishnadas, A., Kansal, J., Charron, N.E. et al. Spectral Quantum Chemistry and Infrared Resonance Library for Data-Driven Molecular Spectroscopy. Sci Data 13, 618 (2026). https://doi.org/10.1038/s41597-026-07240-0
Keywords: infrared spectroscopy, molecular fingerprints, quantum chemistry data, spectral databases, machine learning in chemistry