Clear Sky Science · en

A chromosome-level reference genome of an endangered plant Craigia yunnanensis

· Back to index

A tree on the brink and why its DNA matters

High in the limestone mountains of southern China and northern Vietnam grows a little-known tree called Craigia yunnanensis. Once more widespread, it now survives in scattered patches of forest and is officially recognized as a threatened species. This study built the first detailed “road map” of the tree’s entire genetic makeup, a resource that can help scientists understand how it adapts to its environment and guide smarter efforts to keep it from disappearing.

Figure 1
Figure 1.

Where this rare tree survives today

Craigia yunnanensis is a deciduous tree in the mallow family, related to plants such as cacao and durian. It is found only in East Asia, and today mainly clings to life on rocky limestone slopes in Yunnan Province and neighboring northern Vietnam. Decades of deforestation and habitat fragmentation have shrunk its natural range and left only small, isolated groups of trees. Because these remaining populations are so scattered, there is a real risk that they will lose genetic diversity over time, making them less able to cope with disease, pests, or climate change.

From forest to genome blueprint

To capture the species’ genetic blueprint, the researchers first collected roots, stems, leaves, and young root tips from wild trees in Yunnan. They froze these tissues immediately to preserve the DNA and RNA inside. Using a mix of modern sequencing technologies, they then read the tree’s DNA at very high resolution. Long, highly accurate DNA reads from PacBio HiFi sequencing were combined with shorter Illumina reads and a special “Hi-C” technique, which reveals how pieces of DNA are physically folded and packed inside chromosomes. This combination allowed the team not just to read the genetic code, but to assemble it into long, continuous stretches that correspond to real chromosomes inside the cell.

Building chromosomes and finding genes

The finished genome adds up to about 1.62 billion DNA letters, similar to many other tree species. The team was able to organize 98% of this sequence into 41 distinct chromosomes, matching the 41 chromosome pairs they had seen under the microscope in the tree’s cells. Checks using standard quality tests showed that the assembly is both very complete and very accurate: nearly all expected core plant genes were present and properly assembled. The researchers then used several lines of evidence—comparisons with well-studied plants, the tree’s own RNA, and computer predictions—to identify almost 59,000 regions that code for proteins, and they were able to find likely functions for the vast majority of them.

The hidden majority: repeated DNA and small RNAs

Like many plant genomes, most of this tree’s DNA does not fall within classical genes. Roughly 72% of the genome consists of repeated sequences, dominated by a type of jumping genetic element called long terminal repeat retrotransposons. The team also cataloged thousands of small non-coding RNA genes, including tiny regulators (microRNAs), the transfer RNAs that help build proteins, and components of the cellular machinery that processes RNA. Together, these elements influence how genes are switched on and off and how the tree responds to stress, even though they never become proteins themselves.

Figure 2
Figure 2.

Comparing relatives and confirming the picture

To test how reliable their genome really is, the scientists mapped their raw DNA reads back onto the assembly and found that almost all of them fit well, a sign of high accuracy. They also compared this new diploid genome—representing the ordinary two-copy chromosome set—to a previously published, more complex version from plants with four chromosome copies (an autotetraploid form). The patterns of matching DNA and the rates of silent DNA changes between gene pairs showed that the two assemblies are closely aligned and essentially describe the same species. This cross-check gives extra confidence that conservation genetic studies can safely build on this new reference.

How this genome can help save a species

By turning a rare tree’s DNA into a detailed, chromosome-level map, this work provides a powerful toolkit for conservation. Scientists can now pinpoint which populations hold unique genetic variants worth protecting, track how past climate shifts shaped the species, and identify genes involved in stress tolerance or local adaptation. Conservation planners, in turn, can use these insights to design seed collections, breeding programs, and habitat restorations that maintain the tree’s evolutionary potential. In short, this genome transforms Craigia yunnanensis from a poorly understood forest relic into a species whose future can be guided by precise genetic knowledge rather than guesswork.

Citation: Cheng, Z., Xing, Y., Pan, Y. et al. A chromosome-level reference genome of an endangered plant Craigia yunnanensis. Sci Data 13, 567 (2026). https://doi.org/10.1038/s41597-026-06746-x

Keywords: endangered plants, reference genome, forest conservation, plant genetics, Craigia yunnanensis