Clear Sky Science · en

A real-world framework for automated product recognition and catalog generation: dataset, model, and analysis

· Back to index

Smarter Store Shelves for Busy Shoppers

Anyone who has hunted for a specific cereal box or tried a self-checkout knows that store shelves are crowded, confusing places. This paper explores how computers can look at everyday grocery shelves and automatically recognize what is there, using ordinary photos instead of barcodes. The goal is to make tasks like inventory counting, catalog creation, and even phone-based product lookup faster, cheaper, and less dependent on manual work.

Figure 1. How a phone photo of store shelves can turn into an automatic list of products for retailers and shoppers
Figure 1. How a phone photo of store shelves can turn into an automatic list of products for retailers and shoppers

Why Shelves Are Hard for Computers

At first glance, teaching a computer to spot products might sound simple: just show it lots of pictures of each item. In reality, supermarket scenes are messy. Products appear at many sizes, from close-up shots in a shopper’s hand to distant views from security cameras. Packages look similar, differ by small details, and can be partly hidden behind others. Lighting changes, shelves are reorganized, and brands vary from one region to another. Existing image collections for research often skip these headaches, using small numbers of products, controlled lighting, or only close-up images. That makes it hard to develop systems that truly work in real stores.

A New, Realistic Grocery Image Collection

To close this gap, the authors built a new image collection called Grocer-Help. It contains 13,771 pictures showing about 4,000 distinct grocery products grouped into 349 brand-based classes. The images come from eight stores in five different Indian states, captured with six types of mobile cameras. Scenes range from close-up shots of a few items to long-shot views of full aisles, and include everyday quirks like glare, motion blur, cluttered backgrounds, and partial blocking of labels. Each product in an image is carefully marked with a box around it, resulting in more than 166,000 annotated items. The dataset is split into three main types of images: close-shot, long-shot, and clean online catalog pictures, which together allow researchers to study how viewing distance and capture style affect recognition.

A Lean Model That Sees at Many Scales

Alongside the dataset, the authors introduce a compact detection model designed to handle products at many sizes in the same scene. Instead of treating small and large items separately, the model uses a special building block that gathers visual clues across several scales at once. It then stacks these clues into a pyramid of feature maps, where each layer focuses on different levels of detail. This helps the system follow products from far-away shelf views down to fine differences between similar packages. The model is also built to be efficient: it uses lighter-weight operations so it can run on devices with limited computing power, making it more suitable for use in stores or on consumer hardware.

Figure 2. How a vision model combines details at many sizes to draw boxes around grocery items on crowded shelves
Figure 2. How a vision model combines details at many sizes to draw boxes around grocery items on crowded shelves

Testing Across Datasets, Stores, and Distances

The researchers compare their model with popular object detection systems such as various versions of YOLO and RetinaNet on several existing grocery datasets and on Grocer-Help. On the new dataset, their model reaches a solid score for correctly finding products while using fewer parameters than many rivals. It achieves particularly strong precision and recall, meaning it is good at both avoiding false alarms and not missing items, though its boxes are sometimes less tight when judged by very strict overlap rules. Detailed tests reveal that performance depends on how images are captured: close-up images are easiest, long-range shelf views are harder, and mixing online catalog pictures into training can harm results because they look so different from real store scenes. Store-by-store comparisons also show that neat shelves and box-style packaging tend to help the detector.

What This Means for Everyday Retail

In plain terms, this work shows how to move beyond simple barcode scanning toward camera-based systems that can “see” crowded store shelves. By offering a large, realistic dataset and an efficient model that handles products at different sizes and viewpoints, the study provides a foundation for practical tools like automatic inventory checks, shelf-based catalog building, and smarter mobile shopping apps. While there are still challenges, especially in tightly packed scenes and for products seen only a few times in training, Grocer-Help and the omni-scale model bring automated product recognition closer to everyday use in real-world retail.

Citation: Sah, M., Mathew, J. & Dayananda, P. A real-world framework for automated product recognition and catalog generation: dataset, model, and analysis. Sci Rep 16, 14834 (2026). https://doi.org/10.1038/s41598-026-42266-9

Keywords: grocery product recognition, object detection, computer vision retail, dataset benchmark, inventory automation