Clear Sky Science · en

Hybrid neural–cognitive models reveal how memory shapes human reward learning

· Back to index

Why past experiences matter for everyday choices

Each time you decide which route to drive, which snack to buy or which website to click, you are quietly learning from past rewards and disappointments. Psychologists have long described this learning with simple formulas that average past outcomes into a single score for each option. This study asks whether such stripped down accounts are enough to explain how real people actually learn from rewards, and uses modern neural networks to uncover a richer picture of how memory shapes our choices.

From simple scores to richer memories

Classic models of reward learning, known as reinforcement learning models, assume that each option you can choose is tagged with a single running value that is updated a little bit after every outcome. Pick a snack, get 70 points, and the internal value for that snack creeps upward; get 10 points, and it slides down. These models have been very influential, linking behaviour and brain activity in many species. Yet scattered findings hint that they may be too simple. People can give special weight to particular past events, seem sensitive to the overall range of rewards they have seen, and show brain signals that do not line up neatly with a single running value.

A large online game of chance

To probe these issues, the researchers asked more than 800 online volunteers to play a computer game hundreds of times. On each trial, players chose one of four coloured options and immediately saw how many points they had won. Unknown to them, the true payoffs slowly drifted over time, so that the best option at the start of a game might be mediocre later on. Across more than six hundred thousand trials, people generally learned to favour the more rewarding choices, but their detailed patterns of switching, streaks and exploration contained far more structure than simple models could capture.

Figure 1. How rich memories of past rewards guide our everyday choices among changing options
Figure 1. How rich memories of past rewards guide our everyday choices among changing options

Blending human-readable models with neural networks

The team compared several ways of describing this behaviour. At one extreme was a carefully tuned traditional model that used a handful of numbers to track option values and a simple tendency to repeat or switch actions. At the other extreme was a flexible recurrent neural network, a kind of artificial brain that can store rich information about the past in its internal state but is usually hard to interpret. As expected, the neural network predicted people’s choices far better than the classic model. The key step was then to build hybrid models that kept the transparent structure of the classic approach, but replaced individual pieces with small neural networks that could, in principle, learn any rule that fit the data.

Discovering hidden memory states

The first hybrids allowed for more flexible updating of option values and for sensitivity to the context of unchosen options, but these additions still fell short of the full neural network. The decisive advance came with a model called Memory-ANN. Here, the system kept distinct memory variables that stored a rich summary of past rewards and actions, separate from the simpler variables that directly drove choice. These memory variables were implemented with compact recurrent networks inside the model. When fitted to the data, Memory-ANN matched the predictive power of the opaque neural network while remaining interpretable. Analysis showed that its memory tracked both recent and long term reward history at multiple time scales, and adjusted how strongly new rewards influenced future choices.

Figure 2. How layered memories combine many past rewards to tune future choices step by step
Figure 2. How layered memories combine many past rewards to tune future choices step by step

What this means for how we learn from rewards

The findings suggest that human reward learning cannot be fully described as slowly adjusting a single score for each option. Instead, our brains seem to maintain richer internal records of what happened when, and use these records to tune how strongly we react to new wins and losses. The work shows that combining classic cognitive theories with neural networks can reveal this hidden structure, offering models that both fit large datasets and shed light on the mental processes that guide everyday decisions.

Citation: Eckstein, M.K., Summerfield, C., Daw, N.D. et al. Hybrid neural–cognitive models reveal how memory shapes human reward learning. Nat Hum Behav 10, 972–987 (2026). https://doi.org/10.1038/s41562-025-02324-0

Keywords: reward learning, human decision making, memory, reinforcement learning models, recurrent neural networks