Clear Sky Science · en

A watershed-scale potential pathogenic bacteria dataset from the Yangtze River Basin

· Back to index

Why this river’s hidden microbes matter

The Yangtze River Basin is the lifeline for hundreds of millions of people and a major source of drinking water in China. Yet along with fresh water, the river also carries invisible hitchhikers—bacteria that can cause disease. This study does not report an outbreak or a pollution scare. Instead, it builds a detailed map and catalog of potentially harmful bacteria along the entire 6,300-kilometer river system, giving scientists and water managers a new, data-rich way to keep watch over microbial health risks before they turn into crises.

Figure 1
Figure 1.

Taking the pulse of a giant river

Large rivers are natural highways that connect mountains, cities, farms, and wetlands. They collect runoff from crowded towns, industrial zones, livestock operations, and wildlife habitats, making them hotspots where disease-causing microbes can accumulate and spread. Until now, efforts to track these organisms in the Yangtze River Basin have been scattered in space and time. The authors of this study set out to stitch those fragments together into a single, basin-wide picture that shows where potentially dangerous bacteria are found, how diverse they are, and how they vary across different parts of the river and its surroundings.

Reading the DNA in water, mud, and soil

Instead of relying on traditional culture methods that grow a few familiar indicator bacteria in the lab, the team turned to metagenomics—the direct sequencing of all DNA in an environmental sample. They compiled 625 metagenomic datasets from earlier research and from their own field campaign, covering water, bottom sediments, and riverside soils along the upper, middle, and lower reaches of the Yangtze and across wet and dry seasons. After strict quality control, 586 high-quality samples remained. This approach allowed them to search, in one sweep, for many kinds of bacteria at once, including those that are rare or hard to grow.

Finding suspect bacteria with genetic fingerprints

To pick out potential pathogens from the flood of DNA sequences, the researchers used a specialized bioinformatic pipeline based on “genome-specific markers.” These markers are short DNA snippets that are unique to a given bacterial species, like barcodes. Drawing on global lists of medically important bacteria from the World Health Organization and other agencies, they assembled a reference library of hundreds of pathogen genomes and then extracted more than 700,000 such markers. By matching metagenomic reads from the Yangtze samples exactly to these markers, and applying conservative filtering rules, they created a high-confidence inventory of species that are known or suspected to cause infections in humans or animals.

Figure 2
Figure 2.

What lives where along the Yangtze

The analysis revealed 403 potential pathogenic bacterial species across the basin—393 found in water, 138 in sediment, and 51 in bank soils. Commonly detected species included several Acinetobacter and Sphingomonas types, along with other bacteria previously associated with clinical or environmental infections. Using geographic information software, the team translated these findings into maps that show the richness—the number of different potential pathogen species—at each sampling site. Separate maps for water, sediment, and soil highlight how microbial communities differ between habitats and change from the river’s upper reaches to its lower, more densely populated stretches, offering a spatial framework for future risk studies.

What this means for water safety

This work does not claim that every detected bacterium poses an immediate danger. Many of the species are opportunistic pathogens that only cause disease under certain conditions, and the study focused on bacteria, leaving out viruses, protozoa, and fungi. Instead, the dataset serves as a detailed baseline and early-warning tool. By providing a standardized, genome-level catalog and richness maps for the entire Yangtze Basin, the study gives public health and environmental agencies a way to track changes over time, compare different regions, and link microbial patterns to factors such as land use, wastewater discharge, and climate-driven extremes. In simple terms, it turns the Yangtze’s invisible microbial world into a measurable, mappable resource for smarter water safety management.

Citation: Wang, J., Wang, S., Li, T. et al. A watershed-scale potential pathogenic bacteria dataset from the Yangtze River Basin. Sci Data 13, 581 (2026). https://doi.org/10.1038/s41597-026-06983-0

Keywords: Yangtze River, waterborne pathogens, metagenomics, microbial water quality, environmental health