Clear Sky Science · en

Perceptual boundary of vowel quantity: a perceptual study of synthesized Arabic vowels

· Back to index

Why tiny slices of time matter in speech

When we hear someone speak Arabic, we rarely notice how long each vowel lasts. Yet small differences in timing can completely change a word’s meaning—much like the difference between “bit” and “beat” in English. This paper asks a deceptively simple question: exactly how long does a vowel need to be before native speakers of two major Arabic dialects hear it as “long” rather than “short”? By answering that, the study reveals how our ears carve up continuous sound into the distinct building blocks of language.

Figure 1
Figure 1.

Short and long sounds that change meaning

Arabic uses vowel length as a core part of its sound system: pairs like /a/ and a longer /aː/ can distinguish completely different words. Earlier work has measured how long these vowels tend to be when people talk, showing that long vowels are usually about one‑and‑a‑half to three times the length of short ones. But those studies focused on how vowels are produced, not how they are heard. This study turns the question around: at what point along a gradual increase in duration do listeners switch from hearing a vowel as short to hearing it as long—and does that switching point look the same for speakers of different Arabic dialects?

Two dialects under the microscope

The researcher compared listeners from Najdi Arabic, spoken in central Saudi Arabia, and Cairene Arabic, the dominant dialect of Cairo. Both varieties share the same basic set of three short vowels /a, i, u/ and three long vowels /aː, iː, uː/. To focus purely on timing, the study used carefully edited recordings of three minimal word pairs (for example, a short‑vowel word meaning “he wrote” versus a long‑vowel word meaning “he corresponded”). Starting from naturally long vowels, the author gradually squeezed their duration in small steps, using software that preserved pitch and sound quality while shortening the vowel. This created smooth series of vowels that ranged from clearly long to clearly short without introducing unnatural glitches.

Listening in and choosing between two words

Forty adult participants—twenty Najdi speakers and twenty Cairene speakers—completed an online listening task. After a brief familiarization phase with the original, unaltered word pairs, each person heard the manipulated versions one by one. For every item, they had to decide which word they heard: the version with the long vowel or the one with the short vowel. They could replay a sound before answering, but once they responded, they could not go back and change their choice. Using statistical models that take into account both the specific word and the individual listener, the researcher traced how the chance of a “long” response rose as the vowel became longer in milliseconds.

Where listeners draw the line in time

The results show that duration is a powerful cue for all three vowels, but that the precise boundary between short and long depends on both vowel type and dialect. For the high front vowel [i], Cairene listeners began hearing the vowel as long at shorter durations—around 84 milliseconds—while Najdi listeners typically needed about 96 milliseconds before switching to “long.” Cairene listeners also changed their judgments more abruptly along the timing scale, suggesting a sharper, more categorical boundary. For the low vowel [a], both groups shared almost exactly the same boundary, near 101 milliseconds, though Cairene listeners again showed a steeper, more decisive shift. For the back vowel [u], the boundaries were very close—about 100 milliseconds for Najdi speakers and 110 for Cairene speakers—and the small difference was not statistically meaningful.

Figure 2
Figure 2.

What this tells us about hearing speech

To a layperson, these tens of milliseconds may seem trivial, but they reveal how finely tuned our hearing is to the sound patterns of our own dialect. The study shows that Najdi and Cairene speakers agree on the general timing needed to mark a vowel as long, especially for [a] and [u], yet they calibrate that timing differently for [i]. It also shows that individuals vary: some listeners treat the short‑to‑long change as a sharp step, others as a more gradual shift. Together, these findings support the idea that sound categories are not rigid, universal boxes. Instead, our experience with a particular dialect shapes the exact temporal thresholds that our brains use to turn a flowing stream of sound into meaningful words.

Citation: Alfaifi, A. Perceptual boundary of vowel quantity: a perceptual study of synthesized Arabic vowels. Humanit Soc Sci Commun 13, 271 (2026). https://doi.org/10.1057/s41599-025-06454-8

Keywords: Arabic vowels, vowel length, speech perception, dialect variation, phonetics