Clear Sky Science · en
Systematic background selection with BasCoD enhances contrastive dimension reduction in single cell genomics
Why this research matters to everyday science readers
Modern biology can now measure the activity of thousands of genes in hundreds of thousands of individual cells at once. These powerful experiments are used to compare, for example, diseased versus healthy tissue or treated versus untreated cells. But making sense of such enormous datasets is tricky: important treatment effects can be hidden behind background differences that have nothing to do with the question at hand. This paper introduces BasCoD, a new statistical tool that helps scientists choose the right “background” data so that the real biological story stands out clearly.

Separating signal from noise in giant cell datasets
In single-cell genomics, researchers often compare a “target” group of cells, such as drug-treated cells, with a “background” group, such as untreated controls. To visualize these data, they compress thousands of gene measurements per cell into just a few coordinates, a process called dimension reduction. Contrastive dimension reduction goes a step further: it searches specifically for patterns that are strong in the target but weak in the background, helping highlight treatment-specific changes. However, these contrastive methods quietly assume that the background data are well chosen. If the background behaves very differently from the target for unrelated reasons, the resulting plots can be misleading, and there has been no formal way to check this assumption—until now.
A new way to judge background data
BasCoD (Background Selection for Contrastive Dimension Reduction) provides a mathematical test for deciding whether a candidate background dataset is appropriate. The central idea is intuitive: for a background to be valid, it should not contain strong structures that the target lacks. In technical terms, the low-dimensional “space” describing the background should sit entirely inside the space describing the target. BasCoD takes the low-dimensional representations produced by standard tools such as principal component analysis or modern neural-network-based embeddings, then compares how the target and background spaces overlap. If the background contains extra, distinct structure, BasCoD returns a very small p-value, signaling that this background is likely to distort contrastive analysis rather than clarify it.
Lessons from real biological case studies
The authors apply BasCoD to a series of real datasets where contrastive methods have been used. In a study of mouse brain protein measurements, shock-treated mice were compared with untreated controls. Earlier work showed that using control mice as background allowed subtle differences between two genetic groups to emerge clearly. BasCoD agreed, assigning a moderate p-value that supports this background choice. In contrast, for human stem cells differentiating into neurons, the team found that using very early-stage cells as background for late-stage, stressed cells produced almost no improvement in separation of key donor-specific traits. BasCoD sharply rejected this early time point as a valid background but endorsed later control samples that shared more structure with the stressed cells, matching biological expectations.
Guiding complex time courses and perturbation experiments
BasCoD also helps in more intricate situations, such as tracking cells over developmental “trajectories” or across many experimental conditions. In human bone marrow data, the method showed that some blood cell lineages could serve as good backgrounds for stem cells, while others were too distinct, and this aligned with known behavior of critical genes. In mouse intestine data, the authors deliberately constructed poor background sets with non-overlapping cell types; BasCoD flagged these as invalid. By progressively removing incompatible cell types and retesting, they arrived at a calibrated background that, when fed into a contrastive method, clearly separated cells infected by different pathogens. In designed experiments on blood cell differentiation under inflammatory signals, BasCoD identified which combinations of time and treatment produced trustworthy contrasts and which would lead to muddled interpretations and misleading gene-enrichment results.

Finding hidden interactions between gene perturbations
The study further demonstrates that BasCoD can uncover subtle interaction effects in large-scale CRISPR perturbation screens, where genes are silenced one at a time or in pairs. By treating cells with double-gene perturbations as the target and single-gene perturbations as background, the authors used BasCoD to test whether the variability of the double perturbation could be explained by simply combining the single-gene effects. Gene pairs from the same functional family tended to violate this assumption, leading to strong rejections and signaling non-additive behavior. For one such pair, the team showed that many genes changed in ways that could not be predicted from either single perturbation alone, highlighting BasCoD’s ability to flag combinations that produce genuinely new cellular states.
What this means for future single-cell studies
Overall, BasCoD gives researchers a principled way to ask a previously neglected question: “Is my background data actually suitable for this contrast?” By quantifying how well a candidate background fits within the structure of the target data, BasCoD helps prevent misleading visualizations and downstream analyses in studies that compare treatments, time points, cell types, or gene perturbations. For non-specialists, the key message is that the choice of what counts as “background” in big biological datasets is not just a matter of convenience. With a tool like BasCoD, scientists can systematically design and verify these choices, leading to clearer pictures of how cells respond to drugs, infections, inflammation, and genetic changes.
Citation: Park, K., Sun, Z., Liao, R. et al. Systematic background selection with BasCoD enhances contrastive dimension reduction in single cell genomics. Nat Commun 17, 4077 (2026). https://doi.org/10.1038/s41467-026-70652-4
Keywords: single-cell genomics, dimension reduction, contrastive analysis, background selection, CRISPR perturbation