Clear Sky Science · en

FLASH-MM: fast and scalable single-cell differential expression analysis using linear mixed-effects models

2026-02-05 · Back to index

Why tiny cells need big computing help

Modern biology can now read the activity of thousands of genes in hundreds of thousands of individual cells at once. This single-cell view promises sharper insight into how our bodies fight infections, differ between men and women, or develop disease. But turning these huge, messy datasets into trustworthy discoveries is painfully slow and, if done naively, can be misleading. This paper introduces FLASH-MM, a new way to crunch single-cell data that keeps the statistics honest while making the computing fast enough for today’s largest studies.

The challenge of noisy, crowded cell data

Single-cell RNA sequencing measures which genes are “on” or “off” in each cell, across many people and conditions. Cells from the same person tend to look alike because they share genes and life history, while people differ widely from each other. This creates a layered structure in the data: many cells within each person, and many people within each condition, such as sick versus healthy. If these relationships are ignored, standard methods can mistakenly label thousands of genes as changed when they are not, simply because they treat every cell as an independent data point. At the same time, single-cell datasets have exploded in size, now including hundreds of subjects and up to millions of cells, stretching conventional statistical tools past their limits in both time and memory.

A smarter way to model people and cells

To cope with these complexities, statisticians often turn to linear mixed-effects models, which explicitly separate consistent differences between conditions (for example, tuberculosis status or sex) from random differences between individuals. In principle, these models are ideal for single-cell studies because they can account for both the similarities among cells from the same person and the variation across people. In practice, however, widely used software for these models slows to a crawl or runs out of memory on large single-cell experiments. Researchers therefore often fall back on shortcuts, such as averaging counts across all cells of the same type within each person, which throws away much of the fine-grained cell-to-cell information that makes single-cell data so powerful.

How FLASH-MM speeds up the heavy lifting

FLASH-MM keeps the strengths of mixed-effects models while re-engineering how the calculations are done. Instead of repeatedly passing through giant tables of cell-by-gene measurements, FLASH-MM first distills each dataset into a compact set of summary numbers that capture how cells relate to known features such as library size, cell type, treatment, or donor. The core algorithm then works only with these smaller matrices, shrinking the computational burden from scaling with every cell to scaling with the much smaller number of model ingredients. The authors also tweak the way model variability is represented so that standard statistical tests remain valid, allowing simple t- and z-statistics to assess both the main effects of interest and the added value of including person-to-person variation. Simulation studies using realistic artificial data show that FLASH-MM’s answers match those from gold-standard software down to several decimal places, while running between roughly 50 and 140 times faster and using far less memory.

Putting the method to work in real tissues

To demonstrate real-world impact, the team applied FLASH-MM to two demanding single-cell datasets. In a map of over 27,000 healthy human kidney cells from 19 donors, FLASH-MM searched for gene activity differences between male and female donors within each cell type, while treating each person as a random factor to avoid overconfident results. It found the strongest sex-linked patterns in a specific kidney tubule cell type, where male cells favored pathways related to acid handling and blood pressure, and female cells showed enrichment for signaling and receptor recycling processes. FLASH-MM completed this analysis in about a minute, compared with nearly two hours for a standard tool. The method also analyzed roughly half a million memory T cells from 259 people in a tuberculosis cohort, identifying sets of genes and pathways linked to disease status in different activated T cell states. Here, FLASH-MM finished in under an hour and a half, versus more than two days for the conventional approach.

What this means for future cell-by-cell studies

From a lay standpoint, the message is that we can now make better use of the flood of single-cell data without cutting corners. FLASH-MM keeps track of which cells came from which person and condition, so that detected gene changes are more likely to reflect genuine biology rather than quirks of sampling or batch. At the same time, its lean computations make it feasible to analyze hundreds of thousands of cells on standard computers, opening the door to more ambitious studies of subtle disease signals, sex differences, and rare cell states. Because the approach is general and available in both R and Python, it can be extended to newer technologies such as spatial gene mapping and multi-layer molecular measurements, helping researchers turn vast cell-level datasets into robust, clinically relevant insights.

Citation: Xu, C., Pouyabahar, D., Voisin, V. et al. FLASH-MM: fast and scalable single-cell differential expression analysis using linear mixed-effects models. Nat Commun 17, 2384 (2026). https://doi.org/10.1038/s41467-026-69063-2

Keywords: single-cell RNA sequencing, differential expression, linear mixed-effects models, statistical genomics, computational biology