Clear Sky Science · en

YOLO-LSBA: A high-precision model for detecting stems of small-sized cherry tomatoes

· Back to index

Why smarter tomato picking matters

Cherry tomatoes are delicious but surprisingly hard to harvest automatically. Human pickers can see where the delicate stems connect each fruit to the vine and cut them cleanly, but robots struggle with this tiny target hidden among leaves, branches, and changing light. This study presents a new computer vision model that helps robots spot those thin stems accurately and quickly, making fully automated cherry tomato picking more realistic for modern farms.

Figure 1. How cameras and a compact model guide a robot to pick cherry tomatoes by finding their delicate stems.
Figure 1. How cameras and a compact model guide a robot to pick cherry tomatoes by finding their delicate stems.

The challenge of seeing tiny stems

In greenhouses, cherry tomato clusters hang in different directions, overlap with each other, and sit in patchy sun or shade. Most existing vision systems for harvesting focus on detecting the fruits themselves, which are relatively big, round, and colorful. The stems, however, are thin, partly hidden, and easily confused with nearby branches. Yet those stems determine where and how a robot should cut so that it removes ripe fruits without bruising them or tearing the plant. The authors argue that reliable stem detection is the missing link between recognizing tomatoes and actually picking them with a robotic arm.

Building a richer picture from limited data

The researchers started with 3,000 images of tomato plants from a greenhouse in northern China, captured under many lighting conditions and from different angles. They labeled each tomato as ripe, unripe, or stem and then used data augmentation techniques to expand the dataset more than fourfold. By randomly flipping, cropping, brightening, darkening, and adding visual noise, they created over 12,000 images that mimic real-world variations. This richer collection helps the model learn what stems look like even when they are dim, partly hidden, or surrounded by confusing backgrounds.

A tuned-up vision model for small details

At the core of the work is an improved version of a popular real-time detector known as YOLO. The new model, called YOLO-LSBA, is tuned specifically for small, fine structures like tomato stems. One part of the upgrade helps the network "look" over a wider area of the image while still keeping track of fine details, which improves its ability to separate stems from leaves and supports. Another part reorganizes how information flows across the width and height of the image and between color channels, trimming away redundant signals so that the model pays more attention to subtle, stem-like patterns. A third component carefully combines features at different scales, preventing the strong signals from large fruits from drowning out the faint signatures of thin stems.

Figure 2. How an AI model gradually isolates thin tomato stems from cluttered images to mark precise cutting points.
Figure 2. How an AI model gradually isolates thin tomato stems from cluttered images to mark precise cutting points.

Putting the model to the test

The team ran extensive experiments to see how each new component contributed to stem detection. They found that the upgraded architecture significantly improved the precision of stem recognition while keeping the model lightweight enough for small computers often used on farm robots. On benchmark tests, YOLO-LSBA outperformed several well-known detection models, including other compact YOLO versions and traditional systems such as SSD and Faster R-CNN, especially for the difficult stem category. The authors then deployed the model on a Raspberry Pi single-board computer and in greenhouse field trials, where it kept up with video input and accurately marked stems even when fruits overlapped or lighting was poor.

What this means for future farm robots

In simple terms, the study shows that robots can be trained to "see" the fragile stems of cherry tomatoes almost as reliably as a careful human picker, and to do so on modest hardware. The YOLO-LSBA model reaches around 97 percent precision in stem detection while still running fast enough for real-time use. This paves the way for harvesting robots that can approach each tomato cluster, find the safest cutting point, and remove fruits cleanly and gently. While the authors note that more varied field data and long-term tests are still needed, their approach offers a practical blueprint for smarter picking systems not only for tomatoes but also for other clustered crops.

Citation: Liu, Q., Chen, F., Zhang, H. et al. YOLO-LSBA: A high-precision model for detecting stems of small-sized cherry tomatoes. Sci Rep 16, 15552 (2026). https://doi.org/10.1038/s41598-026-46348-6

Keywords: cherry tomato harvesting, fruit stem detection, agricultural robotics, computer vision, YOLO model