Clear Sky Science · en

Use of machine learning and voice for multiclass classification of Parkinson’s disease, chronic obstructive pulmonary disease, and healthy controls

2026-05-19 · Back to index

Listening to Illness Through the Human Voice

Most of us rarely think about how much our voices reveal about our health. Yet subtle changes in pitch, steadiness, or breathiness can carry clues about disorders that affect the brain and lungs. This study explores whether a short recording of someone holding the vowel “ah” into a smartphone, combined with modern machine learning, can help tell apart people with Parkinson’s disease, those with chronic obstructive pulmonary disease (COPD), and healthy older adults.

Figure 1. Simple phone-recorded vowel sounds flow into a model that sorts voices into Parkinson’s, COPD, or healthy groups.

Why Parkinson’s and COPD Affect How We Sound

Parkinson’s disease is best known for tremor and stiffness, but it also often makes speech softer, more monotone, and less clear. COPD, a long-term lung disease, narrows the airways and makes breathing difficult, which can in turn make the voice weak, hoarse, or breathy. Although both illnesses disturb the simple act of producing sound, doctors still lack quick and objective tests based on voice. Most earlier research has asked computers to decide only between “patient” and “healthy,” usually for one disease at a time and within one language. The authors instead asked a tougher and more realistic question: can a single system listen to very simple speech sounds, in different languages, and sort people into three groups at once?

How the Researchers Collected and Shaped the Voices

The team combined two large voice collections recorded on mobile devices. One, from the mPower project, contained English speakers with Parkinson’s disease and healthy volunteers. The other, called COPDVD, contained Swedish speakers with COPD and matched healthy controls. To make the groups comparable, the researchers carefully selected similar numbers of men and women, with close ages and recording counts, ending up with 96 people and 1,723 usable recordings of sustained “ah.” They removed silent segments, then turned each recording into a 102-number description that captured basic voice measures like pitch and roughness, as well as detailed spectral fingerprints known as Mel Frequency Cepstral Coefficients.

Figure 2. One voice becomes acoustic patterns, passes through four models that vote together, and ends as three separated voice clusters.

Teaching a Voting Team of Algorithms to Listen

Instead of trusting a single machine learning method, the researchers built a “voting committee” of four different classifiers. Each algorithm listened to a recording’s feature set and produced its own guess about whether it came from Parkinson’s disease, COPD, or a healthy control, along with a probability for each option. These probabilities were then averaged so that the final answer reflected the group’s consensus. To avoid fooling themselves with overfitting, the team used a strict training strategy: models were tuned and tested many times on separate folds of the data, and the final performance was judged on a completely separate set of people whose recordings the algorithms had never encountered during training.

What the System Heard in the Voices

On this independent test set, the ensemble reached about 84 percent overall accuracy and a balanced F1 score just under 0.84, meaning it performed well across all three groups despite differences in sample sizes. The system was especially good at spotting Parkinson’s disease, which showed the highest precision and recall. Healthy voices were classified with intermediate success, while COPD voices were hardest to identify and were most often confused with healthy recordings. Notably, Parkinson’s and COPD were rarely mistaken for one another, suggesting that their vocal signatures, although both abnormal, differ in ways the algorithms could detect. When the researchers examined how vowels filled the acoustic “space” defined by their resonant frequencies, they found subtle but consistent shifts and spreads between the three groups, even though the languages differed.

Peeking Inside the Black Box

To understand what guided the system’s decisions, the team used a modern explanation tool that assigns an influence score to each voice feature. They discovered that the most important acoustic traits were not the same for every group. Age, detailed spectral shapes, and pitch-related measures all mattered, but in different combinations for Parkinson’s disease, COPD, and healthy controls. For example, certain spectral descriptors and formant patterns were more influential in COPD, while particular spectral and pitch cues played a stronger role in Parkinson’s disease. This pattern suggests that the model truly learned disease-specific aspects of how people produce a sustained vowel, instead of just detecting that a voice sounds “unusual.”

What This Could Mean for Everyday Care

In simple terms, this work shows that a short, sustained “ah” recorded on an ordinary mobile device can contain enough information for a carefully designed machine learning system to distinguish between brain-related and lung-related voice problems and normal aging voices. The approach does not replace a medical diagnosis, and larger, more diverse studies are needed, but it points toward a future where quick, non-invasive voice checks could support clinicians in screening and monitoring people with Parkinson’s disease or COPD, even across different languages and settings.

Citation: Idrisoglu, A., Behrens, A. Use of machine learning and voice for multiclass classification of Parkinson’s disease, chronic obstructive pulmonary disease, and healthy controls. Sci Rep 16, 15485 (2026). https://doi.org/10.1038/s41598-026-53409-3

Keywords: Parkinson’s disease, COPD, voice biomarker, machine learning, mobile health