Clear Sky Science · en
High-Quality Genome Assemblies of Two Prototheca wickerhamii Strains
Why this tiny algae matters to our health
Most of us think of algae as harmless green scum on ponds, powered by sunlight. But some algae relatives have shed their green pigment and turned into stealthy germs that can infect people and animals. One such culprit, Prototheca wickerhamii, causes rare but stubborn infections of skin, soft tissue, and occasionally deeper organs. Doctors struggle with it partly because its basic biology is still poorly understood. This study delivers high‑quality blueprints of the DNA from two clinical strains of this microbe, giving researchers a detailed parts list that can help explain how it survives in the body and how we might better diagnose and treat the infections it causes. 
A colorless cousin hiding in plain sight
Prototheca wickerhamii belongs to a little‑known group of “colorless” microalgae that no longer perform photosynthesis. Instead of living off sunlight like their green relatives, they subsist in moist environments and sometimes inside warm‑blooded hosts. Over the past two decades, reported infections caused by these organisms have risen, especially in people with weakened immune systems and in companion animals. Yet the true burden is likely underestimated, because Prototheca can be missed or misidentified in routine lab tests. Earlier work decoded the DNA of one reference strain and suggested that the organism carries many genes similar to known virulence factors in disease‑causing fungi, hinting that its genome has been shaped to thrive in the human body.
Collecting and reading the microbe’s DNA
In the new study, scientists focused on two clinical strains, named Pw26 and PwS1, isolated from patients in different Chinese cities. They first grew pure colonies on standard lab media and confirmed that no other microbes contaminated the cultures. The team then extracted high‑quality DNA and used a modern long‑read method called PacBio HiFi sequencing. Unlike older techniques that chop DNA into very short fragments, HiFi reads span tens of thousands of bases at a time with high accuracy. This makes it easier to reconstruct entire chromosomes with few gaps. The researchers generated more than a billion and a half bases of sequence for Pw26 and over eight hundred million for PwS1, providing deep coverage of both genomes.
Building complete genomes and finding repeated patterns
Using specialized assembly software, the long DNA reads were stitched into continuous stretches representing the organism’s chromosomes. The final genome sizes were about 17.8 million and 17.4 million bases for Pw26 and PwS1—similar to, but slightly larger than, the previously studied strain. Each was assembled into only 14 to 17 pieces, and statistical checks showed that most expected core genes were present, a sign of completeness. The team then searched for repeated DNA elements, which can shape how genomes evolve. These repeats made up roughly 6 percent of Pw26 and 4 percent of PwS1, dominated by a class called long terminal repeats often seen in plant and algae genomes. Subtle differences in the amount and type of repeats between the two strains may reflect how each has adapted to different environments or hosts.
What the genes say about how the microbe lives
After masking out repeats, the researchers predicted protein‑coding genes using a combination of three approaches: computer models trained on gene structure, comparison with known proteins from related algae and Prototheca strains, and alignment of previously collected RNA data. This yielded around 6,400 genes in each genome. They then annotated these genes using two widely used catalogues of gene function. One, called Gene Ontology, groups genes by the kinds of tasks they perform in the cell, while the KEGG database maps them to metabolic pathways. Both strains had many genes involved in energy production, breaking down and building up nutrients, and regulating cellular processes. PwS1 showed extra emphasis on lipid‑related pathways and signaling, echoing earlier findings that linked this strain’s unusual mucoid appearance and lower toxicity to changes in its surface and metabolism. 
Checking accuracy and comparing the two strains
To ensure that their reconstructions were reliable, the team remapped the original long reads onto each assembled genome. Over 93 percent of reads matched back with even coverage, and the pattern of base composition showed no signs of contamination. Another quality check, called BUSCO, confirmed that more than 86 percent of a standard set of conserved algal genes were present and intact in both strains. Finally, when the two genomes were lined up using whole‑genome comparison tools, their DNA segments matched almost one‑to‑one, indicating a very high degree of similarity and supporting the idea that the assemblies accurately capture the underlying chromosomes.
What this means for future diagnosis and treatment
For non‑specialists, the main message is that we now have detailed, trustworthy DNA maps for two disease‑causing strains of Prototheca wickerhamii. These maps do not by themselves cure infections, but they provide the foundation for asking sharper questions: which genes allow the microbe to evade the immune system, which pathways could be targeted by existing drugs, and how do different strains vary in virulence and drug response? Because the data have been made publicly available, laboratories worldwide can use them to design better diagnostic tests, track outbreaks from a One Health perspective that links human and animal health, and eventually inform more precise treatment strategies for this uncommon but challenging pathogen.
Citation: Fang, L., Guo, J., Ning, Q. et al. High-Quality Genome Assemblies of Two Prototheca wickerhamii Strains. Sci Data 13, 633 (2026). https://doi.org/10.1038/s41597-026-06916-x
Keywords: Prototheca wickerhamii, genome assembly, opportunistic infection, long-read sequencing, pathogen genomics