Clear Sky Science · en
Bidirectional state space modeling for lightweight and robust wheat head detection in complex agricultural environments
Why counting wheat heads from the sky matters
Feeding a growing world depends on knowing how much food our fields can produce, and one key clue comes from wheat heads, the grain-bearing tips of each plant. Farmers and breeders have long counted these by hand to estimate yield, but doing so across large fields is slow, costly, and prone to error. This study introduces a new computer vision system, Mamba-WheatNet, that uses images from drones to automatically spot and count wheat heads in real time, even when fields are messy, crowded, and lit by shifting sunlight.
Smart eyes over complex farm fields
Real fields are far from tidy. Wheat heads can be hidden by leaves, overlapped with one another, or mixed with weeds and bare soil, and their color and shape change as they grow. Earlier methods tried to rely on simple traits like color or texture, but these break down when lighting or growth stage changes. Deep learning has improved the situation by letting neural networks learn patterns directly from many images. However, many existing models still struggle with cluttered scenes and often require heavy computing power that is hard to run on drones or small field devices.

A lighter model that focuses on what matters
Mamba-WheatNet is designed to handle these challenges while staying light enough for practical farm use. It combines two main ideas: efficient convolutional layers, which are good at capturing local details, and a newer family of models known as selective state spaces, which are better at tracking long-range patterns. The authors build a feature extractor that first breaks down images into multi-scale patterns and then passes them through a special fusion stage that looks across the whole scene. This helps the system ignore background clutter, such as overlapping leaves or soil, and pay closer attention to the subtle shapes and shades that mark individual wheat heads.
How the model untangles dense wheat heads
Inside Mamba-WheatNet, two custom building blocks do much of the heavy lifting. One, called the Residual Depthwise Separable Block, uses lightweight convolutions split across channels to capture fine details without inflating the number of computations. The other, the Bidirectional Spatial Scanning Block, scans feature maps horizontally and vertically, gathering context in both directions while selectively emphasizing the most informative channels. These components are combined into a larger fusion unit that blends local and global cues, making it easier to separate touching or overlapping wheat heads and to maintain accuracy when heads are tiny or densely packed in aerial images.

Putting the system to the test
To see how well their approach works in realistic conditions, the researchers trained and tested Mamba-WheatNet on GWHD-2021, a large public collection of drone images of wheat fields covering many varieties, planting layouts, and growth stages. They compared their system with several leading object detectors, including the latest YOLO models and transformer-based networks. Mamba-WheatNet achieved higher precision and recall and slightly better overall detection scores while using modest computing power and running fast enough for near real-time use. The team also checked how well the model transfers to a different drone dataset, VisDrone2019, which contains many types of small objects in crowded city scenes. There, too, the model delivered the best accuracy among the compact detectors tested, showing that its design generalizes beyond wheat.
What this means for future smart farming
The study concludes that Mamba-WheatNet offers a practical and accurate way to count wheat heads from the air, even when fields are visually complex. By carefully combining light, efficient layers with a global scanning mechanism, the model keeps track of subtle patterns without overwhelming hardware typically found on drones or field robots. For farmers, breeders, and agronomists, this could mean faster, more reliable yield estimates and better monitoring of crop health over time, helping move agriculture toward more data-driven and precise management.
Citation: Deng, G., Li, Z. & Gao, Z. Bidirectional state space modeling for lightweight and robust wheat head detection in complex agricultural environments. Sci Rep 16, 14895 (2026). https://doi.org/10.1038/s41598-026-45083-2
Keywords: wheat head detection, drone imagery, deep learning, precision agriculture, object detection