Clear Sky Science · en

MoSA-Det: motion state adaptive object detection for sports videos

· Back to index

Sharper Eyes on the Sports Field

When you watch a live match on TV, it looks effortless for the cameras and graphics to track every player and the ball. Behind the scenes, though, computers struggle especially with fast action. This paper introduces a new way for algorithms to "watch" sports that keeps up with rapid motion and delivers cleaner, more reliable tracking for uses like broadcast overlays, tactics, and training.

Figure 1. How adapting to motion speed helps computers track players and the ball more clearly in sports videos.
Figure 1. How adapting to motion speed helps computers track players and the ball more clearly in sports videos.

Why Fast Action Confuses Computers

Sports videos are full of quick sprints, long passes, and sweeping camera moves. For computer vision systems, that creates two big problems. First, when players or the ball move quickly, they become blurred, losing sharp edges and textures that detectors rely on. The authors show that in these cases the digital signals inside a network grow weaker and more unstable, so the system is less sure what it sees. Second, many modern video methods try to improve decisions by blending information from several nearby frames. That works well if objects barely move between frames, but in fast sports they can jump so far that their positions no longer line up, so adding more frames actually injects noise and reduces accuracy.

A System That Adapts to Motion

The researchers propose MoSA-Det, a framework that changes how it processes each region of an image depending on how fast it is moving. Instead of treating every pixel the same, the system first estimates a motion "state" for each location, grouping them into static, slow, or fast. It does this by comparing features between consecutive frames and analyzing how strongly they match nearby areas. This motion map then guides two key modules: one that focuses on improving the clarity of what is seen in a single frame, and another that decides how much to trust information from other frames over time.

Cleaning Up Blurry Players and Balls

The first module, called the Motion-Aware Adaptive Feature Module, tackles the blur problem inside individual frames. It passes each region through several branches that look over different-sized neighborhoods, from very local details to a wider surrounding area. The motion map tells the network how to mix these views: slow or still regions rely more on small neighborhoods to preserve fine detail, while fast regions lean on broader views that can gather scattered information. For the very fastest areas, such as a flying ball, the module activates a special branch that learns to "bend" its sampling grid to better follow distorted shapes, helping recover useful signals even under strong blur.

Figure 2. How a smart detector treats slow and fast motion differently over time to avoid blur and misalignment in sports videos.
Figure 2. How a smart detector treats slow and fast motion differently over time to avoid blur and misalignment in sports videos.

Using Time Only When It Helps

The second module, the State-Guided Temporal Aggregation Module, decides how to combine information across frames without letting misalignment cause harm. It uses the motion map to adjust the weights on past and future frames for each location. In static regions, it blends several frames fairly evenly, which smooths out noise and makes detections more stable. In fast-moving regions, it concentrates weight on the current frame and uses learned shifts to roughly align older frames before mixing them in, and even then blends them cautiously. A small extra branch also nudges the final bounding boxes to correct for the way blur can shift the apparent center of a moving object.

What the Results Mean for Sports Tech

Tested on two large sports video datasets for soccer, basketball, and volleyball, MoSA-Det consistently outperforms strong existing methods. It detects players and the ball more accurately, especially in crowded scenes, under heavy motion, and at stricter accuracy thresholds that demand very precise outlines. Importantly, it still runs fast enough for real-time broadcasting. For a layperson, the main message is that this system teaches computers to pay attention differently to slow and fast motion instead of using a one-size-fits-all approach, leading to cleaner tracking and more reliable graphics during high-speed play.

Citation: Yang, L., Sun, W. & Ren, J. MoSA-Det: motion state adaptive object detection for sports videos. Sci Rep 16, 15969 (2026). https://doi.org/10.1038/s41598-026-43231-2

Keywords: sports video detection, object tracking, motion blur, computer vision, deep learning