Clear Sky Science · en

A curated resource of chemolithoautotrophic genomes and marker genes for CO₂ fixation pathway prediction

· Back to index

Microbes That Help Balance Earth’s Carbon Budget

Hidden in soils, oceans, and extreme environments, certain microbes can build their own biomass using carbon dioxide (CO₂) as their main carbon source. These microscopic chemists are crucial for keeping Earth’s carbon cycle in balance and may inspire new ways to capture industrial CO₂. Yet until now, scientists have lacked a simple, reliable way to look at a microbe’s DNA and tell which CO₂-fixing strategy it uses. This study introduces a curated gene catalog and a new computer tool, AutoFixMark, designed to fill that gap.

Many Routes to Turn Air into Biomass

Not all organisms that fix CO₂ do it the same way. Microbes have evolved at least seven natural pathways that convert CO₂ into organic matter. Some, like the Calvin–Benson–Bassham cycle common in plants and many bacteria, are well known; others, such as the reductive glycine pathway discovered only in 2020, are still poorly charted. These pathways are scattered across many branches of the tree of life and often reuse similar enzymes, which makes it surprisingly hard to tell them apart by genome sequence alone. Existing software can predict broad metabolic capabilities, but it has not been optimized or thoroughly tested for pinpointing the exact CO₂ fixation routes.

Figure 1
Figure 1.

Building a Clean Reference Map of CO₂-Fixing Microbes

The researchers began by assembling two carefully checked genome collections. First, they selected 15 well-studied microbes whose CO₂ fixation pathways have been worked out in detail. These reference organisms, spanning several bacterial and archaeal groups, served as blueprints for defining the key enzymes that are truly distinctive for each pathway. Next, they created a benchmark set of 347 chemolithoautotrophic genomes—microbes that gain energy from inorganic chemicals and build biomass from CO₂. Each genome in this larger set was linked, by hand from the literature, to specific CO₂ fixation pathways, providing a solid truth set for testing predictions.

Marker Genes and Simple Rules Instead of Black Boxes

Using the 15 reference genomes, the team identified “marker genes” for each of the seven CO₂ fixation pathways and mapped them to standardized KEGG Orthology (KO) identifiers. Instead of relying on opaque machine learning, they encoded transparent rules about how these markers combine. Some reactions can be carried out by any one of several alternative enzymes, handled by a “one_of” rule. Others rely on multisubunit complexes and must have “all_of” a defined set of KOs. For the reductive glycine pathway, where not all components are fully understood, the tool uses “at_least” rules that require a minimum number of subunits to be present. These logical rules are stored in a machine-readable JSON file that forms the core knowledge base for AutoFixMark.

A Lightweight Tool That Outperforms Established Software

AutoFixMark itself is a small, rule-based program written in Python. It takes as input a list of KO IDs for the genes in a microbe’s genome, typically produced by a separate tool called KofamScan, and then checks which marker rules are satisfied for each of the seven pathways. The authors compared AutoFixMark to two widely used metabolic annotation tools, METABOLIC and gapseq, using their 347-genome benchmark set. All three tools performed well on classic pathways like the Calvin cycle, the reductive tricarboxylic acid cycle, and the Wood–Ljungdahl pathway. However, AutoFixMark clearly outshone the others for newer or less common pathways such as the 3-hydroxypropionate/4-hydroxybutyrate cycle, the dicarboxylate/4-hydroxybutyrate cycle, and the reductive glycine pathway, some of which are not even covered by the competing software.

Figure 2
Figure 2.

What These Results Mean for Climate and Ecology Studies

The curated gene sets, the AutoFixMark program, and the full benchmark genome collection are publicly available. This means researchers can now screen both isolated microbes and metagenome-assembled genomes to see which CO₂ fixation strategies they are genetically equipped to use. Importantly, the authors stress that AutoFixMark predicts genetic potential, not whether a pathway is active under real-world conditions. Many of these biochemical routes can run in reverse, depending on the cell’s energy balance. Even so, having a robust, transparent way to flag CO₂-fixing microbes will help scientists map where and how life pulls carbon out of the atmosphere, guide experiments on emerging pathways, and support the design of future CO₂-based biotechnologies.

Citation: Kawashima, S., Okabeppu, Y., Miyazawa, S. et al. A curated resource of chemolithoautotrophic genomes and marker genes for CO₂ fixation pathway prediction. Sci Data 13, 121 (2026). https://doi.org/10.1038/s41597-026-06655-z

Keywords: microbial carbon fixation, autotrophic metabolism, genome annotation, CO2 capture, metagenomics