Clear Sky Science · en
Narrative context shifts gaze from visual to semantic salience
Why our eyes don’t just follow the brightest thing
When you look at a picture, your eyes jump around in quick movements, landing briefly on different parts of the scene. It might seem obvious that your gaze is drawn to whatever is most colorful or high-contrast. But in everyday life we are usually following stories—watching a movie, reading comics, scrolling through photos—and trying to make sense of what is happening. This study asks a simple but powerful question: as a story unfolds, do our eyes keep chasing the flashiest bits, or do they shift toward the parts that matter most for understanding the plot?

Watching wordless picture stories
The researchers invited adults to view short, wordless picture stories about a boy and his animal friends. Each story was made up of 24 hand-drawn images that, in their original order, form a clear beginning, middle, and end. Sometimes participants saw the pictures in this proper sequence, so that a coherent story could be built in their minds. Other times, the very same images were shuffled into a random order, scrambling the storyline while keeping the visual content identical. Throughout, people were simply told to look at the pictures freely while their eye movements were recorded with high-precision tracking equipment.
Measuring what is visually striking versus what is meaningful
To understand what aspects of each image pulled the eyes, the team compared two very different kinds of “importance.” First, they estimated visual salience—how much an object stands out purely because of its image properties, such as contrast and edges—using advanced computer-vision models that predict where people tend to look in single pictures. Second, they estimated semantic salience—how important an object is for understanding the story. To do this, separate volunteers wrote short narratives describing each picture sequence in coherent order. A large language model (a modern AI system trained on text) was then used to compute how surprising each word in these narratives was, given the prior context, and those surprise scores were mapped onto specific objects in the pictures (for example, the jealous frog that suddenly bites another frog).
How story order changes where and when we look
With these measures in hand, the authors examined two aspects of gaze: how often each object was fixated, and how quickly it attracted the first look. Across conditions, strongly visually salient objects were, unsurprisingly, looked at more and earlier than other parts of the image. But the key finding emerged when comparing coherent and shuffled story order. When pictures formed a meaningful sequence, viewers looked relatively more often at semantically important objects—the ones that carried narrative weight—than when the same images were scrambled. They also tended to look at these meaningful objects earlier in time within each five-second viewing period. In contrast, the advantage of visually striking objects did not increase in coherent stories; if anything, their early dominance faded more quickly when a sensible narrative could be constructed.
Time course of shifting attention
The study also tracked how this balance changed over successive eye movements. The very first couple of fixations after each new image appeared were strongly driven by visual salience, regardless of context: the eyes initially snapped to the physically prominent parts of the scene. But as viewing continued, especially once several fixations had occurred, a divergence appeared. In scrambled sequences, people kept favoring visually salient regions. In coherent sequences, their eyes increasingly shifted toward semantically important objects that helped update their internal model of the unfolding story. This pattern held not only for the single most salient object, but across all objects in a scene: in coherent stories, semantic importance better predicted both how often and how quickly objects were fixated.

What this reveals about how we understand scenes
These results suggest that our eyes are not mere slaves to brightness and contrast. Instead, they serve our curiosity and understanding. At first glance, we sample the visually loudest parts of a scene, but within a fraction of a second, our internal sense of “what is going on here?” begins to steer our gaze toward the pieces that matter for the story—even if those pieces are visually plain, like a nondescript door or an annoyed frog. By combining eye tracking, image-based models, and language-based AI, the study shows that narrative context reshapes how we explore pictures. In everyday life, this means that eye movements offer a window not only into what we see, but into the invisible story we are constructing in our minds.
Citation: Berlot, E., Schmitt, LM., Huber-Huber, C. et al. Narrative context shifts gaze from visual to semantic salience. Commun Psychol 4, 59 (2026). https://doi.org/10.1038/s44271-026-00426-7
Keywords: eye movements, visual attention, story perception, semantic salience, language models