Clear Sky Science · en
Chromosome-level genome assemblies of Nicotiana attenuata (coyote tobacco) and Nicotiana obtusifolia (desert tobacco)
Why wild tobacco genomes matter
Wild relatives of crops often hide genetic secrets that help them survive heat, drought, insects, and disease. This study cracks open the DNA instruction books of two such plants—coyote tobacco and desert tobacco—at an unprecedented level of detail. By building nearly gap-free maps of their chromosomes, the work gives biologists a powerful reference for understanding how these plants make potent chemical defenses like nicotine and how they cope with life in harsh environments.
From desert plants to digital blueprints
Coyote tobacco (Nicotiana attenuata) and desert tobacco (Nicotiana obtusifolia) grow naturally in the deserts and canyons of the American Southwest. For years, they have served as model species for studying how plants interact with herbivores, microbes, and pollinators. Earlier attempts to read their genomes produced only rough drafts: the DNA was broken into thousands of pieces, with many gaps and uncertain joins. That level of quality was enough for some questions but made it hard to compare genes across species or to pinpoint the origins of new defensive chemicals.

Building chromosomes with new sequencing tools
The authors revisited these wild tobaccos using modern DNA technologies designed for piecing together very large and repetitive genomes. For coyote tobacco, they started from a previous long-read assembly and layered on “Hi-C” data, which captures how distant parts of the DNA physically sit next to each other inside the cell’s nucleus. Those physical contacts act like clues about which fragments belong on the same chromosome and in what order. Using specialized software, they clustered, ordered, and oriented the DNA pieces into 12 full-length chromosomes, covering almost all of the plant’s roughly 2.2-billion-letter genome.
De novo map of desert tobacco
For desert tobacco, the team built the genome almost from scratch. They generated highly accurate long DNA reads on a PacBio sequencer and assembled these into several hundred long segments. Then, as with coyote tobacco, they used Hi-C contact patterns to stitch these segments into 12 chromosomes totaling about 1.3 billion DNA letters. Additional checks ensured that stray fragments from microbes or other contaminants were removed, leaving a clean and compact representation of the plant’s genetic material.
Finding genes amid oceans of repeats
Both genomes turned out to be dominated by repetitive DNA, which makes up about four-fifths of their length and is notoriously difficult to assemble. The new long-read and Hi-C strategy handled this complexity well, allowing the researchers to identify more than 35,000 protein-coding genes in coyote tobacco and over 27,000 in desert tobacco. They combined evidence from RNA molecules made in different tissues and from related nightshade-family species to refine gene predictions. Independent quality tests showed that nearly all expected core plant genes are present and intact, and that long repeat elements are accurately represented—hallmarks of reference-grade genomes.

A foundation for studying plant defenses
To confirm that the assemblies are reliable, the team examined several lines of evidence: Hi-C contact maps that align neatly along chromosome diagonals, statistical measures of base accuracy, and standardized completeness scores that reach levels typical of the best plant genomes. With these robust DNA blueprints now available in public databases, researchers can more easily trace how nicotine and other specialized chemicals evolved, compare gene networks controlling plant–insect battles, and explore why closely related species respond differently to environmental stress. In simple terms, this study turns two previously fuzzy genetic pictures into sharp, full-page spreads, creating a foundation for future discoveries in plant ecology, evolution, and crop improvement.
Citation: Chakraborty, A., Xu, S. Chromosome-level genome assemblies of Nicotiana attenuata (coyote tobacco) and Nicotiana obtusifolia (desert tobacco). Sci Data 13, 441 (2026). https://doi.org/10.1038/s41597-026-07080-y
Keywords: wild tobacco genomes, plant chemical defenses, chromosome-level assembly, Solanaceae genetics, Hi-C sequencing