Clear Sky Science · en

Deep reinforcement learning for network resource optimization in MIMO-NOMA networks to maximize utilization with minimal overhead

· Back to index

Why smarter phone networks matter

As our phones, cars, and countless sensors compete for wireless bandwidth, today’s networks struggle to keep everyone connected smoothly, especially when users are moving fast through cities and along highways. This paper presents a new way to make future 5G and 6G-style networks far more efficient and reliable by teaching the network to learn, in real time, which connections to use and how to share limited radio resources among many users with minimal waste.

Figure 1
Figure 1.

Busy airwaves and the crowding problem

Modern wireless systems must serve huge numbers of users who are constantly on the move. New technologies such as MIMO, which uses many antennas at once, and NOMA, which lets multiple users share the same slice of spectrum, promise big gains in capacity. But in practice, when people travel by car or train and signals fluctuate rapidly, it becomes extremely difficult to decide which base station to connect each user to, how much power to assign, and how to prevent users from interfering with each other. Many existing optimization methods assume fairly stable conditions or perfect knowledge of the radio channel, assumptions that break down in fast, crowded real-world settings.

Letting the network predict the best connection

The authors propose an approach called OSIANRO that starts by improving how devices are assigned to networks and channels. Instead of relying on fixed rules, it uses a strengthened version of a popular machine learning method known as gradient boosting. This upgraded model learns from many examples of past network behavior—such as signal strength, delay, and the type of application in use—to predict whether a given connection choice is likely to succeed or fail. The method is mathematically redesigned to penalize overly complex decisions and to handle rare but important problem cases, such as users that are hard to serve. By carefully scoring and ranking which pieces of information matter most, it focuses only on the most useful features, reducing decision time and errors.

Teaching the network to share fairly and avoid clashes

Once OSIANRO has chosen a promising network or channel, it must decide how to share spectrum and power among many users. The authors build a detailed mathematical model that describes how much data users can send, how signals interfere, and how often users collide when they try to use the airwaves at the same time. Instead of solving this complex puzzle with fixed formulas, the system uses deep reinforcement learning, in which many software “agents” learn through trial and error. Each agent represents a user that chooses which resource block to access and how aggressively to compete for it. The agents receive rewards when overall data rates increase and penalties when interference or channel overhead rises, slowly converging to strategies that keep collisions low while pushing total throughput higher.

Figure 2
Figure 2.

Performance under city streets and highways

To test OSIANRO, the authors simulate realistic urban and expressway scenarios using well-known channel models and open-source tools. They compare their system against an advanced benchmark that uses a specialized quantum-inspired device to optimize resource allocation. Across many experiments, OSIANRO consistently increases the total data rate, squeezes more information out of each unit of spectrum, and sharply cuts the number of collisions, even as the number of users and their speeds grow. The improved gradient-boosted network selection proves more accurate and faster than standard versions, while the reinforcement learning component adapts smoothly to changing radio conditions without relying on perfect prior knowledge.

What this means for everyday connectivity

In simple terms, the work shows that giving wireless networks the ability to predict and learn on their own can make crowded airwaves behave much more like well-organized highways than chaotic parking lots. By smartly choosing which tower and channel each device should use, and by continuously adjusting how users share the spectrum, OSIANRO delivers more data to more users with fewer slowdowns and glitches. While the results come from detailed simulations rather than live deployments, they suggest a practical path toward mobile networks that remain fast, fair, and stable even when we pack them with moving cars, trains, and billions of connected devices.

Citation: Lahza, H., Sreenivasa, B.R., Lahza, H. et al. Deep reinforcement learning for network resource optimization in MIMO-NOMA networks to maximize utilization with minimal overhead. Sci Rep 16, 12635 (2026). https://doi.org/10.1038/s41598-026-42953-7

Keywords: 5G resource allocation, MIMO NOMA, deep reinforcement learning, network optimization, wireless interference