Clear Sky Science · en

A CNN–Bi-LSTM pipeline and open FSW dataset for freestyle wrestling action recognition

· Back to index

Teaching Computers to Watch Wrestling

Freestyle wrestling is fast, tangled, and messy to watch—even for humans. For computers, telling one throw from another in a crowded arena is even harder. This study shows how a carefully designed video pipeline and a new public dataset can help machines recognize specific wrestling techniques, opening doors for smarter sports analytics, coaching tools, and automated highlight generation.

The Challenge of Close-Contact Sports

Most modern video-recognition systems were trained on clips where people are relatively separate and easy to see, like someone jogging or swinging a tennis racket. Freestyle wrestling is different: athletes are locked together, limbs overlap, and the scene is full of distractions from referees, mats, and cheering crowds. Standard benchmarks do not capture this complexity, so methods that work well on everyday actions often stumble when wrestlers clinch, roll, and twist in rapid succession.

Building a New Library of Wrestling Moves

To tackle this gap, the authors created the Open FSW dataset, a curated collection of 210 short clips of freestyle wrestling. Each clip shows exactly one complete move, chosen from seven well-defined techniques such as hip throws, leg tackles, and rolling sweeps. The clips come from two sources: controlled training sessions with a small group of athletes, and broadcast matches from public competitions, which add variety in camera angle, lighting, and background clutter. Experts and referees helped label each clip, and the dataset is split so that clips from the same match or training session never appear in both training and testing, reducing the risk of overestimating performance.

Figure 1
Figure 1.

Focusing on the Wrestlers, Not the Crowd

The heart of the approach is to teach the computer to “pay attention” to the wrestlers and largely ignore the rest. Each video frame first passes through a segmentation model that separates the athletes from the background and produces clean foreground silhouettes. These foreground frames are then processed by a deep image network that compresses each image into a compact feature vector—essentially a numerical summary of the wrestlers’ shapes and positions at that moment. Finally, a bidirectional sequence model looks at the entire series of frame summaries, from start to finish and back again, to decide which of the seven techniques is being performed in the clip.

How Well the System Learns Moves

The researchers tested several popular image encoders and compared their foreground-aware pipeline to earlier methods that rely mainly on skeleton outlines of the athletes. Their best configuration, which combines fine-tuned segmentation with an EfficientNet image backbone and a sequence model, correctly identifies the move in about 83 percent of clips. This is a clear improvement over a strong skeleton-based baseline and over versions of their own system that skip the foreground step. The gains are strongest for moves where bodies are heavily intertwined and the background is especially distracting. Statistical tests across multiple folds of the data confirm that these improvements are unlikely to be due to chance.

Figure 2
Figure 2.

Trade-Offs, Limits, and Broader Impact

Focusing on the wrestlers does come with a cost: running an extra segmentation step roughly doubles the processing time per clip on the tested hardware. For offline analysis—such as post-match breakdowns or research studies—this overhead is acceptable, but real-time applications may need faster segmentation models or more powerful machines. The study also notes that the dataset is relatively small, which they counter with transfer learning and data augmentation, and that segmentation can struggle under extreme motion blur or severe occlusion.

What This Means for Fans and Coaches

In simple terms, the work shows that cleaning up what the computer sees—by carving wrestlers out of the busy scene before analyzing the action—makes it much better at naming specific moves. While the current results are tuned to freestyle wrestling, the same idea could carry over to other close-contact sports like judo or Brazilian jiu-jitsu. By releasing both the dataset and the code, the authors provide a foundation for future systems that can break down complex grappling exchanges automatically, helping coaches, athletes, and fans better understand what happens on the mat.

Citation: Rostamian, M., Mottaghi, A. & Soryani, M. A CNN–Bi-LSTM pipeline and open FSW dataset for freestyle wrestling action recognition. Sci Rep 16, 14632 (2026). https://doi.org/10.1038/s41598-026-44782-0

Keywords: freestyle wrestling, action recognition, sports analytics, computer vision, deep learning