Clear Sky Science · en
High-resolution Annotated Dataset of Girvanella Boundstone Microfacies from the Xiannüdong Formation, China
Ancient Reefs Meet Modern Algorithms
Long before corals built today’s tropical reefs, tiny microbes were already assembling complex underwater structures on the seafloor. These fossilized “microbial reefs” record how early life shaped oceans more than 500 million years ago. The new study behind this article does not describe a single fossil find, but instead releases a carefully built, open dataset of microscope images from such ancient reefs in China—formatted specifically so that modern artificial intelligence (AI) systems can learn to read the rock record on their own.

Rocks from a Very Old Shallow Sea
The research focuses on rocks from the Xiannüdong Formation in South China, deposited during the early Cambrian, a time when animal life was rapidly diversifying and marine ecosystems were becoming more complex. These rocks preserve a reef-like structure called Girvanella boundstone, built mainly by filamentous cyanobacteria that left behind calcified tubes and crusts. Mixed with these microbial structures are grains of sediment, skeletal fragments, and mineral cement that filled the spaces between them. Together, these ingredients form a detailed snapshot of an ancient shallow, wave‑stirred seafloor where biology and seawater chemistry worked hand in hand to build solid carbonate platforms.
Turning Rock Slices into Digital Tiles
To make this ancient story usable for computers, the team started with thin slices of reef rock mounted on glass slides and imaged them at high resolution under a polarizing microscope. From 28 original slabs, seven were chosen for detailed processing. Each whole‑slab image was overlaid with a regular grid and then chopped into many small square tiles, each 114 by 114 pixels. These tiles are just large enough to capture key textures—like tangled microbial tubes, fine mud, or coarse grains—yet small enough to serve as standardized “pixels” for machine learning. This process produced tens of thousands of image snippets that together cover the full variety of micro‑textures found in the rock.

Careful Human Labels for Machine Learning
Digital images alone are not enough; AI also needs examples of what each pattern means. The researchers therefore manually labeled the different components seen in the rock: Girvanella crusts, various types of grains, mud, cement, and other features. They created special “mask” images in which each pixel secretly carries a numerical class identifier in one color channel. A Python script then used these masks to assign each tile to one of ten microfacies classes—such as skeletal grainstone, laminated microbialite, or dolomitic mudstone—based on a point‑counting rule that sums pixel values. Tiles with unclear or missing labels were automatically excluded. The final dataset was split into training, validation, and test sets in balanced proportions, and more than 95% agreement was confirmed between automated and manually checked labels.
A FAIR Resource for Geology and AI
The finished product is a well‑structured, public dataset hosted on Figshare, following FAIR (Findable, Accessible, Interoperable, Reusable) principles. All tile images are stored as standard PNG files, and their labels and dataset split are documented in a single CSV file. In parallel, the authors provide open‑source Python code on GitHub that reproduces the whole pipeline: slicing the thin‑section images into tiles, reading the hidden labels, checking quality, and organizing the data. This means other researchers can plug the dataset directly into deep learning frameworks, compare competing models on a common benchmark, or adapt the workflow to their own rock collections.
Why This Matters Beyond One Reef
By transforming a complex ancient reef into an organized library of labeled image tiles, the study builds a bridge between early Earth ecosystems and modern AI tools. For non‑specialists, the takeaway is that interpreting rock textures—once the preserve of expert petrographers peering down microscopes—can increasingly be shared with algorithms trained on openly available data. This dataset will help scientists automate the classification of carbonate rocks, refine reconstructions of long‑vanished seas, and apply transfer learning to other geological settings. In simple terms, it turns a slice of Cambrian seafloor into a reusable teaching set for computers, accelerating our ability to read the planet’s deep history locked in stone.
Citation: Choi, S., Kim, D., Hong, J. et al. High-resolution Annotated Dataset of Girvanella Boundstone Microfacies from the Xiannüdong Formation, China. Sci Data 13, 611 (2026). https://doi.org/10.1038/s41597-026-06958-1
Keywords: Cambrian reefs, carbonate microfacies, geology datasets, deep learning in geoscience, microbial carbonates