Clear Sky Science · en

Sequence to structure insights into Lassa virus population-level biophysical properties and glycoprotein structure catalogue

· Back to index

Why tiny changes in a virus matter

Lassa fever affects tens of thousands of people in West Africa each year, yet we still lack a vaccine. The virus that causes it, Lassa virus, comes in several regional varieties, or lineages, that differ in how severe disease can be and how well they respond to antibodies. This study asks a simple but powerful question: at a physical level, how different are these viral lineages from one another, and could those differences help scientists design better vaccines and treatments?

Figure 1. How comparing basic protein traits across Lassa virus families can guide vaccine and drug development.
Figure 1. How comparing basic protein traits across Lassa virus families can guide vaccine and drug development.

Looking at the virus as a collection of parts

The researchers treated the virus like an engineered object made of four main parts: a surface spike that lets it enter cells, a shell that packages its genetic material, an enzyme that copies its genome, and a small protein that helps new particles bud from infected cells. They assembled hundreds of high-quality sequences from public databases and measured straightforward traits for each protein, such as length, estimated weight, and which building-block amino acids it uses. By comparing these traits across lineages, they could see whether some lineages consistently make heavier or compositionally different versions of the same protein.

Heavier proteins without getting longer

One striking pattern appeared in the two proteins encoded on the virus’s small genome segment: the surface glycoprotein complex and the nucleoprotein. In Nigeria, two lineages, called II and III, circulate in different regions. Their glycoproteins almost always have the same length, yet lineage III’s version is on average about 180 units of mass heavier than lineage II’s. The same trend occurs in the nucleoprotein, where lineage III’s version is roughly 140 units heavier despite having exactly the same number of amino acids. This means that lineage III prefers slightly heavier amino acids at a subset of positions, giving its proteins more mass without adding extra length.

Pinpointing where and how proteins differ

To find where these differences sit along the glycoprotein, the team used a combination of machine learning and statistical measures. A random forest classifier learned to guess a sequence’s lineage using only its overall amino-acid makeup and achieved high accuracy, showing that each lineage has a subtle but distinct chemical “accent.” When the authors zoomed in on individual positions, they found that more than half of the glycoprotein is effectively shared between lineages II and III, while a smaller set of sites shows strong lineage bias. At these positions, lineage III tends to use heavier residues such as arginine and glutamine, whereas lineage II favors lighter ones like valine and alanine. Adding up these small shifts across the sequence explains the overall mass gap between the two lineages.

A tiny insertion that the virus can tolerate

The glycoprotein’s length also varies by a single amino acid between some lineages. Most viruses in lineages IV and V have a 491-amino-acid glycoprotein, while lineages II, III, and VII more often have 490. By carefully realigning sequences and building a large family tree, the researchers traced this size difference to a short insertion or deletion near the very beginning of the glycoprotein, around positions 60 and 61. They then used modern structure prediction software to model more than 600 versions of the three-part glycoprotein spike.

Figure 2. How small amino acid changes reshape Lassa virus surface spikes and influence antibody binding in lab tests.
Figure 2. How small amino acid changes reshape Lassa virus surface spikes and influence antibody binding in lab tests.

From computer models to cells

The structural models showed that the extra residue sits on the outer surface of the glycoprotein head and does not disturb the core architecture shared across lineages. To test whether this tiny change matters in living cells, the team expressed several representative glycoproteins, with and without the extra amino acid, in human cell lines. They measured how well the proteins reached the cell surface and how strongly they were bound by two conformation-sensitive antibodies. Across constructs, expression and antibody binding were very similar, suggesting that this particular length difference is well tolerated and does not greatly alter how the glycoprotein is displayed or recognized in this setting.

What this means for future vaccines

Overall, the study shows that Lassa virus lineages are not identical at the level of physical protein properties. Lineage III in particular tends to produce slightly heavier versions of key proteins by favoring heavier amino acids at specific positions, while a small insertion near the glycoprotein’s tip appears structurally and functionally acceptable. By pairing population-scale sequence analysis with structure modeling and lab tests, the authors provide a detailed catalogue of glycoprotein shapes across lineages. This resource can help vaccine and drug designers choose representative targets and focus on regions that are conserved in structure, even when the virus’s sequence continues to evolve.

Citation: Daodu, R.O., Riccabona, J.R., Peter, A.S. et al. Sequence to structure insights into Lassa virus population-level biophysical properties and glycoprotein structure catalogue. npj Viruses 4, 26 (2026). https://doi.org/10.1038/s44298-026-00196-3

Keywords: Lassa virus, glycoprotein, viral lineages, protein structure, vaccine design