Clear Sky Science · en

Behaviorally informed deep reinforcement learning for portfolio optimization with loss aversion and overconfidence

· Back to index

Why our emotions matter in automated investing

Most people know that fear and overconfidence can sway their investment choices, but we tend to assume that computer-driven trading is perfectly rational. This study challenges that idea by showing that even automated systems can benefit from "human-like" traits. By carefully building loss aversion (dislike of losses) and overconfidence into a modern artificial intelligence trading system, the authors find that portfolios can become more resilient in crashes and more effective in booms—across both cryptocurrencies and blue-chip stocks.

Figure 1
Figure 1.

Teaching trading robots about fear and boldness

The researchers start from a powerful branch of AI called deep reinforcement learning, where a software agent learns by trial and error how to rebalance a portfolio over time. In standard versions, the agent behaves like a textbook rational investor: it looks at prices and indicators and chooses portfolio weights that it thinks will pay off in the long run. Here, that neutral agent still exists, but it is wrapped in a behavioral layer that mimics two well-documented investor tendencies: loss aversion (reacting more strongly to losses than to equal gains) and overconfidence (placing too much faith in one’s own forecasts). Rather than changing what to buy or sell, these behavioral rules change how big each position should be once the neutral agent has chosen a direction.

How the behavioral safety belt and turbocharger work

In the loss-averse mode, the system pays special attention to unrealized losses on each asset. When a holding falls beyond a preset threshold, the framework automatically cuts overall risk and shifts part of the portfolio toward cash, while modestly favoring beaten-down assets in line with how many human investors behave. In contrast, in the overconfident mode, strong gains trigger larger position sizes and even some leverage, effectively riding trends more aggressively and occasionally "doubling down" after sharp drops if the system expects a rebound. Importantly, in all cases the reinforcement learning core decides which assets to hold; the behavioral module only dials the exposure up or down around that baseline.

Letting the market mood pick the behavior

To decide when to be cautious or bold, the authors plug in a separate forecasting engine called TimesNet, a deep-learning model designed to uncover repeating patterns in time series. TimesNet looks at recent market data and predicts the next day’s overall return. If it expects a strong upswing, the overconfident agent is activated; if it foresees a downturn, the loss-averse agent takes over; and when the forecast is modest, the neutral agent remains in control. This regime switcher is trained strictly on past data in a walk-forward fashion to avoid any peek into the future, and it can be swapped out for other forecasters without altering the behavioral core.

Figure 2
Figure 2.

Putting the behavior-aware system to the test

The team evaluates their Behavioral Bias–Aware Portfolio Trading (BBAPT) framework on two very different arenas: a 20-asset cryptocurrency basket from 2018 to 2024, and the changing list of Dow Jones Industrial Average stocks from 2008 to 2024. In crypto, where wild swings are common, loss aversion shines in choppy, range-bound markets by trimming exposure and limiting deep drawdowns, while overconfidence excels during strong bull runs by amplifying winners. Over the full period, the combined BBAPT system—using TimesNet to choose between neutral, loss-averse, and overconfident modes—delivers higher risk-adjusted performance than classic Markowitz portfolios, simple equal-weighted strategies, and reinforcement learning agents without behavioral tweaks.

Results that hold up in mature stock markets

In the long-running Dow Jones tests, which include the 2008 financial crisis, the COVID-19 crash, and the inflation shocks of 2022, the same patterns recur. All reinforcement learning–based strategies beat static portfolios on both returns and Sharpe ratio, a common measure of return per unit of risk. Within that group, the loss-averse configuration offers the smoothest ride with the smallest maximum losses, the overconfident configuration captures the highest raw gains at the cost of bigger swings, and the full BBAPT framework sits on the efficient frontier, pairing strong returns with moderated risk. The authors also adjust for changes in index membership to guard against survivorship bias, and find that the main conclusions remain intact.

What this means for everyday investors

For non-specialists, the key message is that successful algorithmic trading does not have to ignore human psychology; it can harness it. By building carefully controlled versions of fear and boldness into an AI trader—and letting a forecasting model decide when each trait should dominate—the BBAPT framework creates portfolios that adapt to booms and busts in a more intuitive way. The work suggests a future in which "smart" trading systems are not just data-driven, but also behavior-aware, offering investors tools that are both more robust and easier to understand than black-box models that assume perfect rationality.

Citation: Charkhestani, A., Esfahanipour, A. Behaviorally informed deep reinforcement learning for portfolio optimization with loss aversion and overconfidence. Sci Rep 16, 6443 (2026). https://doi.org/10.1038/s41598-026-35902-x

Keywords: algorithmic trading, behavioral finance, reinforcement learning, portfolio optimization, cryptocurrency markets