Clear Sky Science · en
SwarmMAP: swarm learning for decentralized cell type annotation in single cell sequencing data
Why this matters for future medicine
Every human organ is made from a rich cast of cell types, and new sequencing technologies now let scientists read the activity of individual cells one by one. This promises better understanding of diseases and more precise treatments. But turning millions of raw cellular readouts into reliable cell labels is slow, subjective, and often blocked by strict privacy rules around patient data. This study introduces SwarmMAP, a way for hospitals and labs to work together on this problem without ever sharing their raw data, opening the door to large, trustworthy cell maps that still protect patients.

The challenge of naming cells
Modern single-cell sequencing can profile gene activity in millions of cells from tissues such as heart, lung, and breast. To make sense of these data, researchers group similar cells and then assign each group a label such as “immune cell” or “blood vessel cell.” Today this step is mostly done by hand, with experts scanning long lists of genes and debating which markers define each cell type. Different groups may use different rules, making results hard to compare. On top of that, patient data are sensitive, so simply pooling all information in one place is often legally or ethically impossible. Scientists need a way to build shared, automatic cell labelers that respect privacy and scale to many organs and diseases.
A swarm instead of a central hub
SwarmMAP tackles this by using “swarm learning,” a collaborative style of machine learning in which multiple sites train a model together without moving their data. Each hospital or research center keeps its own single-cell data behind its firewall. Locally, it cleans the data, selects informative genes, and trains a simple neural network to predict cell types. From time to time, only the model’s numerical settings—not any patient data—are sent into a shared digital “swarm” built on a blockchain network. There, the settings from all partners are averaged and redistributed, so each site benefits from what the others have learned. This process repeats many times, steadily improving a common model while the underlying patient data never leave their home institutions.
How well does the swarm learn?
The authors tested SwarmMAP on nearly two million cells from human heart, lung, and breast tissue, drawing on four separate studies for each organ. They compared three scenarios: training on a single study, on several studies combined at one site, and in the distributed swarm. Performance was measured by how accurately the models could assign the correct cell type or finer cell subtype. Across organs, the swarm models reached accuracies very close to those of models trained on fully combined data, with average scores around 0.9 out of 1. In other words, not having a central data warehouse did not meaningfully reduce quality. The study also showed that using more datasets generally improved results and helped the models handle a wider variety of cell types.

Where the approach struggles
The work highlights a familiar limitation in biology and in machine learning: rare and hard-to-define cell types are more difficult to classify. When certain cells appeared only in small numbers, or when their molecular signatures overlapped strongly with other cells, both the local and swarm models stumbled. This was particularly evident for some specialized immune cells and for “ischemic” heart cells that mix features of several lineages. The analysis confirmed that, across organs, common and well-characterized cell types were labeled with high accuracy, while rare or fuzzy categories remained challenging. In those difficult cases, the swarm models sometimes performed slightly worse than their locally trained counterparts, reflecting the limits of what the data themselves can support.
What this means for future cell atlases
For a lay reader, the key message is that SwarmMAP shows we can build powerful automatic labelers for single cells without pooling sensitive patient data in one place. By letting many centers train together in a privacy-preserving swarm, scientists can create more robust and reusable maps of the body’s cells. These models already perform nearly as well as centralized approaches and are likely to improve as more data and more organs are added. While some rare or ambiguous cell types still defy neat categorization, SwarmMAP offers a practical path toward large-scale, standardized cell atlases that respect both scientific rigor and patient privacy.
Citation: Saldanha, O.L., Goepp, V., Pfeiffer, K. et al. SwarmMAP: swarm learning for decentralized cell type annotation in single cell sequencing data. npj Syst Biol Appl 12, 41 (2026). https://doi.org/10.1038/s41540-026-00667-6
Keywords: single-cell sequencing, cell type annotation, privacy-preserving AI, decentralized learning, systems biology