Clear Sky Science · en

Underwater acoustic vector DOA estimation in hybrid noise environments based on sparsely-gated mixture-of-experts mechanism

· Back to index

Listening for Hidden Signals Underwater

Ships, submarines, underwater robots, and even marine biologists all rely on listening to faint sounds in the ocean to figure out where they are coming from. But the sea is a noisy place: engines, waves, animals, and instruments themselves all add clutter. This study presents a new way to pinpoint the direction of underwater sounds even when the noise is messy and unpredictable, using a modern form of artificial intelligence that learns to cope with different kinds of noise instead of assuming everything is simple and uniform.

Figure 1
Figure 1.

Why Finding Direction Is So Hard in the Ocean

To locate a sound source, engineers use an array of underwater microphones, called hydrophones, lined up in a row. By comparing the tiny differences in when a sound reaches each sensor, they can estimate the direction from which it arrived, a task known as direction-of-arrival (DOA) estimation. Classic methods assume the background noise is like a gentle, even hiss—mathematically, “white Gaussian noise.” Real oceans rarely behave so nicely. Noise can be impulsive, like sudden pops; colored, with more energy at some frequencies than others; or uneven across sensors. This mix of behaviors, called hybrid noise, breaks the assumptions that older algorithms depend on, causing their accuracy to collapse just when conditions are most challenging.

A Smarter Listening Line of Sensors

The researchers base their work on a simple but powerful sensor layout: a straight line of so-called vector hydrophones, which measure both pressure and particle motion in the water. When distant sound sources emit waves, those waves reach each sensor at slightly different times and phases, depending on the angle of arrival. From these measurements, the system builds a covariance matrix—a compact summary of how the signals at different sensors relate to each other over time. This matrix contains the geometric clues needed to infer direction, but it is tangled up with all the complicated noise present in the environment.

Turning Noisy Data into Learnable Patterns

Neural networks typically work with real numbers, but the covariance matrix is complex-valued. The team therefore splits it into two real matrices, representing the real and imaginary parts, and feeds them as a two-channel “image” into a convolutional neural network (CNN). This CNN scans the matrix to uncover spatial patterns that distinguish true signal structure from noise. Instead of relying on hand-designed formulas, the CNN learns these features directly from data, gradually building up from simple local relationships to higher-level patterns that are informative for locating sound sources.

Figure 2
Figure 2.

Many Specialists and One Smart Coordinator

The key innovation is what happens after the CNN: a sparsely-gated mixture-of-experts (SMoE) network. Rather than one large, monolithic model trying to handle every situation, the system includes several smaller expert networks, each trained to excel under a specific noise type, such as white, pink, red, blue, violet, or impulsive noise. A separate gating network looks at the features extracted by the CNN and, for each incoming example, decides which few experts are most relevant. Only those top experts are activated, and their outputs are combined to produce a final estimate of how likely a sound source is at each angle from 0° to 180°. This design makes the model both adaptive—because it changes which experts it listens to as noise conditions vary—and efficient, because it avoids running all experts all the time.

Testing in Tough, Realistic Conditions

To train this system, the authors first generated data where each expert sees only one noise type, allowing it to specialize. Then they trained the gating network on mixtures of all six noises, mimicking real hybrid environments. They also evaluated the model on a large, realistic test set that includes both simulated noise and actual recorded underwater noise, across a wide range of signal strengths and data lengths. Compared with well-known classical techniques and other deep learning approaches, the SMoE model consistently delivered smaller errors and higher success rates, particularly when the noise was strong or when only a limited amount of data was available. At a signal-to-noise ratio of 0 dB—where signal and noise power are equal—the model achieved an average angular error under one degree while rival methods could be off by many degrees.

What This Means for Future Underwater Sensing

In plain terms, this work shows that letting multiple specialized AI “listeners” share the job, and choosing among them on the fly, can dramatically improve our ability to tell where underwater sounds come from in chaotic, noisy conditions. The approach can be adapted to other sensor layouts beyond simple linear arrays, and the same idea—mixture-of-experts with a smart gate—could help in radar, robotics, and other fields where signals must be located in the presence of complex interference. For applications that depend on reliable underwater listening, from navigation to environmental monitoring, this method offers a more flexible and robust way to hear through the noise.

Citation: Xu, W., Yi, S., Gu, H. et al. Underwater acoustic vector DOA estimation in hybrid noise environments based on sparsely-gated mixture-of-experts mechanism. Sci Rep 16, 6192 (2026). https://doi.org/10.1038/s41598-026-37217-3

Keywords: underwater acoustics, direction of arrival, hybrid noise, deep learning, mixture of experts