Clear Sky Science · en
A dataset and benchmark of carbonate thin-section images for deep learning
Why Looking at Tiny Rocks Matters
Oil and gas companies, climate scientists, and geologists all care deeply about the stories locked inside rocks. By slicing rocks paper-thin and viewing them under a microscope, experts can read clues about ancient seas, buried reefs, and the pathways that let oil, gas, and water move underground. This paper introduces DeepCarbonate, a large, carefully checked image collection of such rock slices. It is designed so that modern artificial intelligence systems can learn to recognize rock types automatically, making this traditional craft faster, more consistent, and easier to share around the world.

From Hand Sample to Digital Rock Gallery
The project starts from real rocks drilled and sampled in major oil-bearing formations in China’s Sichuan Basin and the United Arab Emirates. Geologists first inspect each rock slice at the scale of the naked eye to be sure the portion they are studying represents the whole. To avoid being misled by local oddities, they examine at least eight different views at two magnifications, checking textures and grains until the overall rock type can be named with confidence. Only then do they fix the microscope settings and capture high-resolution images focused on the fine details that matter for understanding how these rocks formed and how fluids move through them.
Capturing Rocks in Different Lights
DeepCarbonate does more than snap a single picture of each spot. The same rock slice is imaged in several ways: under normal transmitted light, under crossed polarizing filters, under reflected light, and sometimes after staining that makes some minerals glow with color while others stay dull. Each lighting mode highlights different features—crystal shapes, pore spaces, or organic residues that may hint at hydrocarbons. Together they provide a richer view than any one image alone. All images are taken at a consistent magnification chosen to balance detail with field of view, then passed through a strict quality check so that blurred, too-dark, or damaged pictures are removed.
Calling in a Panel of Human Experts
Because subtle rock features can be tricky to interpret, the team does not rely on a single opinion. Ten specialists in carbonate rocks independently review the images and the proposed labels. If too many disagree with the initial judgment, those images are thrown out rather than risk teaching computers from doubtful examples. The remaining pictures are sorted into 22 distinct rock categories, ranging from fine mudstones and fossil-rich limestones to fracture-filled rocks, foamy pore networks, and microbial structures such as stromatolites and thrombolites. This broad coverage mirrors decades of classic rock classification systems, but packages them for the age of data-driven geology.
Building a Fair Testbed for AI
Once labeled, the images are reorganized into a structure that machine-learning researchers already know from landmark vision datasets. The collection—over 55,000 images in total—is split into training, validation, and test subsets under each lighting mode. The authors then put a suite of popular image-recognition networks, from ResNet and VGG to MobileNet and EfficientNet, through their paces on this new playground. They measure not just how often each model gets the rock type exactly right, but also how well it ranks the correct answer among its top guesses and how fairly it handles both common and rare rock classes.

What the Machines Learned About Rocks
The results show that DeepCarbonate is challenging but learnable: modern networks can correctly classify most images, with lighter, more efficient models often doing especially well. The study also reveals how uneven class sizes—the fact that some rock types are far more common in the dataset than others—can bias the algorithms toward “frequent” rocks. By creating a more balanced subset using only the nine best-represented classes, the authors show that performance improves and the models focus more clearly on the truly diagnostic features in the images. Including all the different lighting modes together also boosts performance, confirming that the extra visual cues carry real value for the machines, just as they do for human petrographers.
What This Means for Energy and Earth Science
To a non-specialist, DeepCarbonate is essentially a shared, high-quality picture book of microscopic rocks, paired with a clear set of rules for testing how well computers can “read” it. By making both the images and the code openly available, the authors provide a common yardstick so that future AI tools for rock analysis can be compared fairly. In the long run, this kind of standardized, expert-checked dataset can help turn a slow, hands-on craft into a faster, more objective digital science—supporting better decisions in energy exploration, carbon storage, and our broader understanding of how Earth’s rocky archives record the planet’s history.
Citation: Li, K., Song, J., Zhang, Z. et al. A dataset and benchmark of carbonate thin-section images for deep learning. Sci Data 13, 340 (2026). https://doi.org/10.1038/s41597-026-06633-5
Keywords: carbonate rocks, thin-section images, deep learning, petrography, geological datasets