Clear Sky Science · en
From sleep staging to spindle detection: a case study on end-to-end automated sleep analysis
Why faster sleep studies matter
Sleep tests can reveal how the brain behaves at night, offering clues about conditions like depression, bipolar disorder, and memory problems. Yet the detailed scoring of these recordings is so time consuming that many studies are small and slow to complete. This paper asks a simple question with big consequences: can computers analyze sleep recordings as reliably as human experts, and do it fast enough to unlock much larger studies?
Looking inside a night in the lab
In a typical sleep study, a person spends the night in a lab while sensors record their brain waves, eye movements, and muscle activity. Later, trained specialists watch the signals in 30 second chunks to label broad sleep stages, such as light sleep and deep sleep. On top of this, they also pick out brief events like sleep spindles, short bursts of brain activity linked to learning and brain health. The authors focus on two key questions: can modern computer models assign sleep stages accurately, and can they then spot spindles well enough to test scientific ideas about mental illness?

Teaching machines to read brain waves
The team used two state of the art deep learning models. One model, called RobustSleepNet, takes in long stretches of brain wave data and labels each short segment as wakefulness or one of several sleep stages. A second model, SUMOv2, examines sections of light non REM sleep and marks the precise moments when spindles appear. Both models were trained on large, previously collected datasets and had never seen the bipolar disorder recordings used in this case study, making the test closer to how they would work in real clinics and research projects.
Matching and even surpassing human scorers
To judge whether the models were trustworthy, the authors compared their decisions to those of multiple human experts. For sleep staging, the computer’s labels agreed with experts about as well as different experts agreed with each other, and in one large dataset the model even matched the group consensus better than a typical pair of scorers did. For spindle detection, SUMOv2 reached agreement levels that fell within or above the range usually seen between human pairs, and it performed especially well when compared to a combined consensus of many scorers. These checks suggest that the automated tools are operating at expert level rather than offering a crude shortcut.

What the models reveal about bipolar disorder
Armed with these automated tools, the researchers reanalyzed data from a previous study comparing people with bipolar disorder to healthy volunteers. In the original work, an expert spent months marking sleep stages and spindles by hand and found that bipolar patients had fewer fast spindles per minute of light sleep, a pattern that might serve as a marker of the illness. The automated pipeline reproduced this key difference in fast spindle density between patient and control groups, and it also echoed more subtle trends, such as slightly lower spindle frequencies in patients, although not every detail reached statistical significance in the new analysis.
Clearing a path to larger, fairer sleep research
Although the automatic counts and exact values did not match the original expert results in every respect, the broad patterns were similar, and the models’ performance was on par with, or better than, the typical disagreement seen between human experts. This suggests that fully automated pipelines can stand in for manual scoring when testing certain scientific questions, especially those focused on group differences rather than single patients. By making their code and a privacy preserving online tool called SomnoBot freely available, the authors aim to help researchers around the world analyze sleep recordings quickly and consistently, opening the door to larger studies of how disturbed sleep and brain health are linked.
Citation: Grieger, N., Mehrkanoon, S., Ritter, P. et al. From sleep staging to spindle detection: a case study on end-to-end automated sleep analysis. Sci Rep 16, 16014 (2026). https://doi.org/10.1038/s41598-026-53891-9
Keywords: sleep analysis, EEG, sleep spindles, bipolar disorder, deep learning