Clear Sky Science · en

Humans can use positive and negative spectrotemporal correlations to detect rising and falling pitch

· Back to index

How Our Brains Hear Notes Moving Up and Down

When you recognize a question in someone’s voice or follow the melody of your favorite song, your ears and brain are tracking how pitch rises and falls over time. This study asks a surprising question: do our brains do this using the same kinds of motion-detection tricks that our eyes use to see movement? By carefully designing new sounds and brain-imaging tests, the authors show that people can hear pitch motion even in sounds with no clear musical notes, revealing a new kind of auditory illusion and a shared algorithm between hearing and vision.

Figure 1
Figure 1.

Hearing Motion Without Clear Notes

In everyday sound, rising and falling pitch is often tied to a clear “fundamental frequency” — the basic note we would sing or play on an instrument. But the authors created special sounds that deliberately lacked this obvious pitch information. Instead of stable tones, they used dense clouds of many frequencies whose loudness changed in coordinated ways over time. These patterns created local relationships between neighboring frequencies and moments in time, known as spectrotemporal correlations. Listeners heard each sound for two seconds and simply reported whether, overall, it seemed to be going up or down in pitch.

A New Hearing Illusion That Flips Direction

When neighboring frequencies tended to get louder or softer together along an upward diagonal in the frequency–time grid, people reliably reported that the sound’s pitch was rising. When the diagonal pointed downward, they reported falling pitch. The surprise came when the researchers reversed the pattern: they made neighboring frequencies alternate, so that when one got louder the other got softer — a “negative” correlation. In this case, an upward-tilted pattern was heard as pitch falling, and a downward-tilted one was heard as rising. This is the sound equivalent of a well-known visual illusion called “reverse-phi,” in which a moving pattern that keeps flipping contrast appears to move in the opposite direction. The strength of the pitch motion people heard depended smoothly on how strongly these correlations were present, and the effect worked even when the information was split across the two ears, showing that the brain combines signals from both sides.

Tuning In to Tiny Shifts in Frequency and Time

To probe the details of this mechanism, the team moved from dense noise to sparse “pip” sounds: brief beeps scattered across frequency and time. They created pairs of pips that were separated by a small jump in frequency and a short delay, and again controlled whether the two were loud together, quiet together, or opposite in loudness. By varying the delay and the size of the frequency jump, they found that people were most sensitive to pitch direction when the second pip followed about 40 milliseconds later and shifted by only about one-fifteenth of an octave — a very small change. Crucially, listeners were sensitive not just to loud–loud pairs, but to all four combinations of loud and quiet. They also heard motion in more complex three-pip patterns that contain no simple pairwise regularities, echoing similar findings in animal vision. All of this points to a system that reads out fine-grained local patterns of change rather than tracking long-lived tones.

Figure 2
Figure 2.

Brain Signatures of Opposing Pitch Detectors

The researchers next asked how this computation might be organized in the brain. Using functional MRI, they measured activity in the auditory cortex while people listened to simple rising tones, falling tones, or a mixture of the two played at once. If the brain used separate sets of neurons tuned to upward and downward pitch motion that oppose each other, then the combined stimulus should partially cancel out their activity. This is exactly what they observed: several regions on both sides of auditory cortex responded strongly to rising and to falling tones alone, but less to the mixture. This “opponent” pattern closely matches motion-processing circuits known from the visual system and naturally explains why flipping the correlation in the sounds flips the perceived direction.

From Lab Illusions to Everyday Speech and Music

Finally, the team asked whether these abstract patterns actually matter in real life. Analyzing hours of English and Mandarin speech, they converted each recording into a time–frequency map and measured how tones were moving up or down, using an algorithm similar to those used for visual motion. They then looked for the same four local intensity patterns studied in the lab. In both languages, patterns where neighboring frequencies changed together tended to coincide with rising or falling tone, while alternating patterns predicted motion in the opposite direction. In other words, both positive and negative spectrotemporal correlations in natural speech reliably signal how pitch is changing. The findings suggest that the auditory system’s sensitivity to these subtle local patterns — including those that create illusions in the lab — is not a quirk, but an efficient way to decode meaning and melody from the complex soundscapes of everyday life.

Citation: Vaziri, P.A., McDougle, S.D. & Clark, D.A. Humans can use positive and negative spectrotemporal correlations to detect rising and falling pitch. Nat Hum Behav 10, 417–433 (2026). https://doi.org/10.1038/s41562-025-02371-7

Keywords: pitch perception, auditory motion, speech intonation, auditory cortex, sensory illusions