Clear Sky Science · en

Comprehensive re-assembly and annotation dataset for the argan tree (Argania spinosa L., Sapotaceae) genome

· Back to index

Why this desert tree matters to you

The argan tree may look like a scraggly shrub clinging to dry Moroccan hillsides, but it fuels a global market in culinary and cosmetic oils and helps anchor fragile ecosystems. This study dives into the tree’s DNA, building one of the most complete genetic roadmaps yet for Argania spinosa. That map will help scientists protect wild forests, improve oil yield and quality, and understand how this hardy tree survives heat and drought—issues that matter far beyond Morocco as the climate warms.

Getting to know the argan tree

Argan trees are found almost only in southwestern Morocco, where they cover close to a million hectares and have been recognized by UNESCO as a biosphere reserve. Local communities rely on them for wood, fodder, and especially argan oil, prized for its rich flavor and its use in skin and hair products. The oil’s value comes from its high levels of healthy unsaturated fats and natural antioxidants such as vitamin E. Yet until recently, scientists had only fragments of the tree’s genetic information, mostly from its leaf “power stations” (chloroplasts) and energy factories (mitochondria). The main instruction book—the nuclear genome in the cell’s core—had been read only in rough draft, with many gaps and little detail on important genes.

Figure 1
Figure 1.

Building a cleaner genetic blueprint

In this work, researchers went back to raw DNA data they had already collected from a single tree known as “Argan Amghar.” Using advanced computer tools, they cleaned the data, removed traces of non-plant DNA, and stitched the short pieces of genetic code together into much longer stretches. The result is a nuclear genome of about 690 million DNA letters, organized into hundreds of pieces called scaffolds. Eleven very large scaffolds together hold about half of all the genetic material, giving scientists a much clearer view of the genome’s overall structure than before.

Finding the genes and hidden repeats

Once the genome was assembled, the team had to figure out where the genes are—those stretches of DNA that carry instructions for making proteins, as well as the many non-coding sequences that help regulate them. They used several independent computer programs trained on related plants, such as tea, olive, and the model plant Arabidopsis, then merged their predictions into a single, high-confidence set. In total, they identified just over 51,000 protein-making genes and more than 2,000 genes for other RNA molecules that do not become proteins but still play vital roles in the cell. They also mapped the “repetitive” half of the genome: sequences that copy and paste themselves or appear many times over. About 53% of the argan genome consists of such repeats, a typical pattern for long-lived trees and a key factor in how their genomes evolve.

What the genes seem to do

To move from raw DNA to biological meaning, the researchers compared argan proteins with those from well-studied species and databases of known protein families. Two-thirds of the genes could be linked to at least one likely function or cellular role, and nearly half had close matches in a trusted protein database, lending extra confidence. More than 1,900 genes appear to act as transcription factors—master switches that turn other genes on and off. Over 7,000 genes were tied to known metabolic pathways, including those that build oils and vitamin E–like compounds. These connections give scientists a shortlist of candidate genes that may shape argan oil’s composition, the tree’s response to drought and heat, and other traits important for farmers and industry.

Figure 2
Figure 2.

A shared toolbox for future work

Beyond the headline numbers, the real product of this study is a carefully organized toolbox. The authors provide the assembled genome, a standard file listing every gene and repeat with its exact position, the predicted protein sequences, and tables describing each gene’s likely role. All of it is stored in public databases where any researcher can download and reuse it without repeating the heavy lifting of assembly and annotation. Tests of genome quality show that the vast majority of essential plant genes are present, although some fine details are still missing—especially alternative gene versions and certain regulatory RNAs that will require future experiments.

What this means in everyday terms

For non-specialists, this work means that the argan tree now has a detailed genetic “atlas” instead of a rough sketch. With this atlas, scientists can more easily pinpoint genes linked to oil yield and quality, resilience to drought, and resistance to disease. Breeders and conservationists can use this information to design better markers for selecting robust trees, support local livelihoods, and help safeguard a unique ecosystem under pressure from climate change and human use. In short, decoding the argan genome lays the groundwork for keeping this ancient tree, and the communities that depend on it, thriving into the future.

Citation: Idrissi Azami, A., Pirro, S., Habib, N. et al. Comprehensive re-assembly and annotation dataset for the argan tree (Argania spinosa L., Sapotaceae) genome. Sci Data 13, 267 (2026). https://doi.org/10.1038/s41597-026-06596-7

Keywords: argan tree genome, argan oil, plant genetics, drought tolerance, vitamin E biosynthesis