Clear Sky Science · en
Small target detection of floating objects in river channels based on improved YOLOv7
Why spotting tiny trash in rivers matters
Rivers and canals often carry tiny pieces of litter—bottles, leaves, plastic fragments—that are hard to see but can cause big problems for ecosystems, flood safety, and human infrastructure. Drones and fixed cameras promise continuous monitoring, yet even advanced computer programs struggle to pick out these small, fast-moving objects from shimmering, ever-changing water. This study presents a new way to teach computers to find such tiny floating items in river scenes more accurately and quickly, opening doors for cleaner waterways and safer operations.
The challenge of seeing through moving water
Watching a river on video, your eye quickly notices floating debris, even as sunlight flashes off the surface and waves ripple unpredictably. For a computer, this is much harder. The shapes of small targets change as they bob on the water, reflections mimic bright objects, and shadows can hide dim ones. Standard detection systems draw boxes around anything that might be an object in each video frame, but those boxes shift and flicker from frame to frame. That instability wastes computing effort and makes it easy to lose track of small items entirely. The result is a mix of missed detections, false alarms, and slow processing, especially when thousands of frames must be analyzed in real time.

A smarter way to agree on what is really there
The authors propose a new framework called Region-Overlap Detection combined with a trimmed-down version of a popular detector known as YOLOv7. Instead of treating every frame separately, the system looks at several consecutive frames and asks a simple question: where do the boxes line up over time? Areas where boxes consistently overlap are treated as more trustworthy than those that appear only briefly or jump around. By focusing first on this stable overlap region, the method filters out many noisy and unstable guesses about where an object might be. Only the most reliable boxes are passed down the pipeline for deeper analysis, giving the system a cleaner, steadier view of the scene before it does any heavy computation.
Doing more with fewer network steps
Modern vision systems often rely on deep stacks of processing layers that learn to recognize shapes, edges, and textures. While powerful, these layers are expensive to run and can wash out the delicate signals from tiny objects. The new method keeps the overall YOLOv7 idea but deliberately uses fewer of these processing steps, activating them only where the overlap-based analysis suggests that a real object is present. Layers that would mostly see background water or random noise are skipped. This “minimum convolution” strategy reduces the total amount of calculation while preserving the crisp boundaries around small floating items. In effect, the network concentrates its effort where it matters most, rather than treating every pixel equally.

Putting the method to the test on real rivers
To see how well this approach works in practice, the team trained and tested it on drone videos of actual rivers, using a large dataset of thousands of annotated images containing nearly forty thousand floating objects of different sizes. They also checked performance on additional public datasets and long river video sequences with changing light, water flow, and viewing angles. Compared with the original YOLOv7 and several newer detectors, the new system found more genuine objects, missed fewer, and analyzed frames faster. The study reports a mean average precision above 73 percent and recall above 70 percent for small floating objects, along with a noticeable gain in processing speed and a reduction in the number of network parameters and operations required.
What this means for cleaner and safer waterways
In simple terms, the paper shows that stabilizing what the computer “thinks it sees” across frames, then trimming away unnecessary processing, makes it much better at spotting tiny pieces of debris moving on lively water surfaces. While the method still needs testing in a wider range of rivers and conditions, it already outperforms several well-known models on challenging river scenes. That makes it a promising building block for real-time monitoring systems mounted on drones, bridges, or riverbank stations. Such systems could help cities and environmental agencies track litter, manage flood risks, and respond quickly to pollution events, turning raw video feeds into reliable, actionable information.
Citation: Yang, W., Zhang, B., Guo, S. et al. Small target detection of floating objects in river channels based on improved YOLOv7. Sci Rep 16, 11423 (2026). https://doi.org/10.1038/s41598-026-40688-z
Keywords: river trash detection, drone river monitoring, small object detection, computer vision for water, YOLOv7 improvements