Clear Sky Science · en

BiG-SCAPE 2.0 and BiG-SLiCE 2.0: scalable, accurate and interactive sequence clustering of metabolic gene clusters

· Back to index

Hidden Chemical Treasures in Microbial DNA

Many of the medicines and crop protection agents we rely on come from tiny molecules made by microbes. These organisms hide the recipes for such molecules in stretches of DNA called gene clusters. As DNA sequencing races ahead, researchers are drowning in data, but still know only a small fraction of what microbes can make. This article introduces BiG-SCAPE 2.0 and BiG-SLiCE 2.0, two upgraded software tools that help scientists sift through vast genomic archives to map, compare, and organize these hidden “molecular factories,” bringing the next generation of antibiotics and agricultural compounds closer to discovery.

Figure 1
Figure 1.

Why Gene Clusters Matter for Health and Agriculture

Microbes use specialized small molecules to compete, communicate, and adapt to their surroundings. The DNA blueprints for producing or breaking down these molecules are often grouped together in metabolic gene clusters. These include biosynthetic gene clusters that build complex natural products, and catabolic gene clusters that let microbes feed on particular compounds or root exudates. Because genes in a cluster act together, finding one such region in a genome is like spotting a self-contained “factory line” that can hint at a molecule’s structure and function. Genome-mining tools already detect such factories in bacteria and fungi, but the real challenge is comparing hundreds of thousands of clusters to see how they are related and what chemical diversity they might hold.

Two Engines for Sorting Molecular Factories

BiG-SCAPE and BiG-SLiCE were originally created to group gene clusters with similar core features into “gene cluster families.” Each family is expected to produce the same or closely related molecules. BiG-SCAPE builds detailed networks of similarities between clusters, while BiG-SLiCE is tuned for speed, able to handle millions of clusters by turning them into simple numerical fingerprints and then clustering those fingerprints. Together, they underpin a growing ecosystem of genome-mining pipelines, databases, and interactive viewers that help researchers navigate microbial chemistry at a planetary scale.

What’s New in BiG-SCAPE 2.0

The 2.0 version of BiG-SCAPE introduces a series of upgrades aimed at both biology and computation. It now understands the more refined “region” concept used by the widely adopted antiSMASH tool, which separates overlapping or hybrid gene clusters into smaller, more meaningful building blocks called protoclusters. New alignment modes and strategies allow BiG-SCAPE 2.0 to home in on the truly important core genes inside each cluster, coping better with rearranged genes and fuzzy cluster boundaries. Under the hood, the codebase has been completely rewritten for speed and sustainability, using a shared SQLite database and a modern Python library for profile searches. As a result, BiG-SCAPE 2.0 can run up to eight times faster than its predecessor, while using about half the memory, and now offers multiple ready-made workflows for clustering, querying, de-duplicating, and benchmarking gene clusters through an upgraded interactive web interface.

Figure 2
Figure 2.

How BiG-SLiCE 2.0 Keeps Up with the Data Deluge

BiG-SLiCE 2.0 focuses on making ultra-large analyses more accurate without losing its hallmark speed. Earlier versions treated all gene cluster types in the same way, which unintentionally favored some families over others. By switching to a cosine-like distance measure and updating its library of biosynthetic protein signatures to the latest standards, BiG-SLiCE 2.0 now groups very different kinds of clusters more evenly. Code optimizations and the move to the same fast profile-search library as BiG-SCAPE bring additional speedups, and new options to export all results as simple text tables make it easier to plug BiG-SLiCE into other analysis pipelines. Tests against nine datasets of manually curated gene families show that BiG-SLiCE 2.0’s accuracy now approaches that of BiG-SCAPE, especially for shorter and more elusive gene clusters.

Revealing a Vast, Untapped Chemical Universe

The authors used both tools to examine 260,630 biosynthetic regions from a public database of microbial genomes. BiG-SCAPE 2.0 and BiG-SLiCE 2.0 produced remarkably similar estimates of how many distinct gene cluster families exist in this dataset, supporting earlier work that only about 3% of the biosynthetic potential encoded in bacterial genomes has been characterized so far. In other words, the overwhelming majority of microbe-made chemicals remains unknown. By making it possible to accurately cluster and visualize gene clusters across hundreds of thousands of genomes—and eventually millions—BiG-SCAPE 2.0 and BiG-SLiCE 2.0 provide powerful lenses for exploring this uncharted chemical universe, paving the way for new drugs, safer crop protection tools, and deeper insights into how microbes shape ecosystems and our own health.

Citation: Draisma, A., Loureiro, C., Louwen, N.L.L. et al. BiG-SCAPE 2.0 and BiG-SLiCE 2.0: scalable, accurate and interactive sequence clustering of metabolic gene clusters. Nat Commun 17, 2000 (2026). https://doi.org/10.1038/s41467-026-68733-5

Keywords: biosynthetic gene clusters, natural product discovery, genome mining, microbial metabolites, computational clustering