Clear Sky Science · en
Predicting FOX gene candidates for oxic nitrogen fixation using multi-omic machine learning and comparative bioinformatics
Why turning air into plant food matters
Modern farming depends heavily on industrial fertilizer, which is made by forcing nitrogen from the air into a usable form using enormous amounts of fossil fuel. This process feeds billions of people but also drives climate emissions and water pollution. In nature, however, certain microbes quietly perform the same chemical trick using sunlight and far less energy. This paper explores how to decode and catalogue the genes that let one such microbe, a cyanobacterium, fix nitrogen even while producing oxygen—something that normally shuts this chemistry down. Understanding these genes could point the way to crops and industrial microbes that fertilize themselves.
The balancing act inside a tiny cell
Nitrogen gas makes up most of the air, but plants and animals cannot use it directly. Specialized microbes rely on an enzyme called nitrogenase to convert nitrogen gas into ammonia, a form life can use. Nitrogenase is extremely sensitive to oxygen, which breaks it. Yet some cyanobacteria, including the species Anabaena 7120, perform oxygen-producing photosynthesis and nitrogen fixation in the same filament. They manage this by forming special cells called heterocysts that maintain a low-oxygen environment for nitrogenase. Besides the core nitrogenase genes, many accessory genes are needed to build the protective cell walls, control the internal chemistry, and shuffle electrons and nutrients. Genes whose loss stops growth on nitrogen gas in the presence of oxygen are known as FOX genes, and only a fraction of them are currently known.

Teaching computers to spot missing nitrogen genes
The authors set out to predict new FOX gene candidates across the entire Anabaena 7120 genome using a blend of biological measurements and machine learning. They assembled a “multi-omic” data set that followed how every gene responded when combined nitrogen was removed from the growth medium, a trigger that causes heterocysts to form. This included time-course measurements of RNA levels, changes in protein abundance, features of the DNA control regions that drive transcription, the physical neighborhood of each gene on the chromosome, and how strongly each gene is conserved in other nitrogen-fixing versus non-fixing cyanobacteria. They then labeled 68 genes already proven to be FOX and chose 835 widely conserved, non-essential genes as a stand‑in for the non‑FOX group.
How well the models worked and what they learned
Using these labeled examples, the team trained three types of models—logistic regression, Random Forest, and XGBoost—and repeatedly tested them on held‑out genes. All three could reliably rank known FOX genes above the proxy non‑FOX genes, with the best models reaching performance on par with other gene‑essentiality predictors. Importantly, the models were not black boxes: the researchers used a technique called SHAP to see which features pushed a gene toward or away from a FOX‑like prediction. FOX genes tended to be strongly switched on late after nitrogen removal, showed low activity before the switch, appeared in clusters with other diazotrophy genes, and were more conserved in known nitrogen‑fixers than in closely related species that do not fix nitrogen. In contrast, genes commonly shared with non‑fixing cyanobacteria, or arranged in certain promoter layouts associated with housekeeping roles, were less likely to be FOX.
New gene candidates and a design tool for engineers
Armed with these insights, the authors generated probability scores for every gene in the genome, using them as a ranking rather than as literal odds. Among the highest‑ranked candidates were genes embedded in the heterocyst envelope region, genes tied to redox balancing and electron transport, and several factors known in other organisms to help assemble or support nitrogenase but not yet classified as FOX in Anabaena. Some top‑scoring genes already have independent experimental hints of importance, lending credibility to the approach. To make the results practical for synthetic biology, the team also built a web tool that helps users pick compact sets of candidate genes that fit within a chosen DNA size limit—roughly the scale that has already been moved into other cyanobacteria—using either simple rank‑order or a size‑aware greedy strategy.

From smarter predictions to smarter crops
For a general reader, the key message is that this work turns a messy, genome‑wide hunt into a focused shortlist of likely players that let oxygen‑producing microbes still fix nitrogen. The study shows that patterns in when genes turn on, how they are wired into their neighbors, and which species keep or discard them together form a recognizable signature of oxic nitrogen fixation. While every candidate still needs experimental testing, the ranked lists and interactive app give researchers a roadmap for systematically filling in the missing pieces. In the long run, that roadmap could guide efforts to equip crops or industrial microbes with robust, self‑contained nitrogen‑fixing systems, reducing dependence on energy‑hungry fertilizer factories and helping agriculture tread more lightly on the planet.
Citation: Young, J., Gu, L. & Zhou, R. Predicting FOX gene candidates for oxic nitrogen fixation using multi-omic machine learning and comparative bioinformatics. Sci Rep 16, 11412 (2026). https://doi.org/10.1038/s41598-026-41873-w
Keywords: nitrogen fixation, cyanobacteria, machine learning, synthetic biology, heterocyst