Clear Sky Science · en

SVDHLA: symmetric variable depth hybrid learning automaton and its application

· Back to index

Teaching Machines to Know When to Stop Trying

Modern learning systems often face a simple but critical dilemma: how long should they keep trying the same choice before switching to something new? This paper tackles that question for a classic decision-making model and shows how giving the system a way to adjust its own persistence can make it faster, more reliable, and even helpful in training better neural networks.

Figure 1
Figure 1.

Why Classic Trial-and-Error Falls Short

The work builds on a long-standing idea called a learning automaton, a simple model that repeatedly chooses among several options and learns from rewards and penalties. A widely used version, known as LK,N,K, represents each option as a short ladder of internal states. The deeper the ladder, the more times the automaton must be punished before it abandons that option. A small depth makes the system change its mind quickly, encouraging exploration, while a large depth makes it stubborn, favoring exploitation of what seems to work. The catch is that this depth must be fixed in advance, even though the best setting depends heavily on the problem and can change over time. In stationary settings a poor choice slows learning; in shifting settings it can trap the system in outdated behavior or make it jittery and unstable.

A Self-Tuning Sense of Persistence

To overcome this rigidity, the authors introduce SVDHLA, short for Symmetric Variable Depth Hybrid Learning Automaton. Instead of locking the depth ahead of time, SVDHLA couples the classical ladder-based automaton to a second, smaller decision-maker whose sole job is to adjust how deep those ladders are. This helper chooses among three simple actions for the whole system: grow the depth of every option by one, shrink all depths by one, or stop and keep the current depth. It bases its decisions on how well the main automaton has been doing recently, summarized by how often it reaches the most favorable internal states versus how often it is forced to switch options. Over time, this creates a feedback loop: if the system is switching too much, the helper tends to increase depth and become more patient; if it is clinging to poor options, it tends to shrink depth and react faster.

Figure 2
Figure 2.

Putting the New Learner to the Test

The researchers tested SVDHLA in a variety of computer-simulated worlds. Some had fixed reward patterns; others changed unpredictably over time or punished frequently repeated choices. Across these scenarios, the new approach consistently earned more total reward and suffered less regret—that is, lost opportunity compared with an ideal decision-maker—than both the original model and a more recent hybrid variant. The key advantage is that SVDHLA can discover on its own whether it should behave cautiously or boldly, and adjust that stance as conditions change. Even in challenging cases with many possible actions and only one or two good ones, the system quickly settled into a useful range of depths instead of endlessly tinkering with its structure.

From Queues and Traffic to Neural Networks

To show that this is not just a toy improvement, the authors applied SVDHLA to two practical problems. First, they used it to decide which queue a server should handle next in a simulated computer system where tasks arrive and finish at uneven rates. Here, the adaptive depth helped the scheduler keep average waiting times lower than both traditional learning automata and popular bandit-style algorithms such as softmax, upper confidence bounds, and Thompson sampling. Second, they used SVDHLA as a controller for dropout in a neural network—the technique of randomly turning off units during training to avoid overfitting. Instead of using a fixed dropout rate, SVDHLA learned, batch by batch, whether to increase, decrease, or maintain the dropout level based on how the loss changed. This adaptive dropout produced slightly higher accuracy and more stable results on the MNIST digit-recognition task than an earlier learning-automaton-based controller.

What This Means for Smarter Learning Systems

In everyday terms, SVDHLA gives a trial-and-error learner a self-tuning sense of how stubborn it should be. Rather than relying on a human engineer to guess the right balance between trying new options and sticking with old ones, the system measures its own successes and failures and adjusts its persistence accordingly. The study shows that this simple extra layer of adaptation can improve performance in static and changing environments alike, and can plug into larger systems such as queue managers and neural networks. Looking ahead, similar ideas could help many other learning methods automatically calibrate how quickly they change their minds, making artificial decision-makers both more robust and easier to deploy.

Citation: Nikhalat-Jahromi, A., Saghiri, A.M. & Meybodi, M.R. SVDHLA: symmetric variable depth hybrid learning automaton and its application. Sci Rep 16, 14336 (2026). https://doi.org/10.1038/s41598-026-43271-8

Keywords: learning automata, reinforcement learning, exploration exploitation, adaptive dropout, multi-armed bandit