Clear Sky Science · en

Enhanced content-based image retrieval via hybrid color, texture, and deep learning features

· Back to index

Why finding the right picture matters

From medical scans to holiday photos, our lives are flooded with images. Yet actually finding the one picture we need in a huge collection can be surprisingly hard. This study introduces CTD-Net, a new way for computers to search large image databases by looking directly at what is in the picture rather than relying only on tags or file names. The work shows how mixing classic image analysis with modern deep learning can make visual search both more accurate and more useful in real-world settings.

Figure 1. How a smart image search system combines picture content and AI to find the closest matching photos in a large collection.
Figure 1. How a smart image search system combines picture content and AI to find the closest matching photos in a large collection.

How computers usually search through images

Early image search tools depended on human-added text such as captions and keywords. That approach is slow, costly, and often incomplete, since different people describe the same scene in different ways. Content-based image retrieval changes the game by letting the computer look at colors, shapes, and textures inside each picture. However, many existing systems still fall short for complex scenes. Simple color or texture formulas can miss important details, while pure deep learning models may need huge datasets and are sometimes hard to interpret. The result is a gap between what the computer sees as numbers and what people recognize as meaningful content.

Blending simple picture clues with deep learning

CTD-Net tackles this gap by combining two kinds of clues from each image. First, it extracts handcrafted features that describe basic visual properties. Color histograms and color moments summarize how shades are spread across the picture, while wavelet transforms and local binary patterns capture fine texture patterns and edges. Second, the system feeds the same image into a powerful deep neural network called EfficientNet-B7, which learns more abstract patterns such as object parts and complex layouts. All of these signals are carefully scaled and merged into a single long feature vector that captures both simple appearance and richer scene meaning.

Figure 2. How color, texture, and deep neural network features merge to compare images and rank the most similar search results.
Figure 2. How color, texture, and deep neural network features merge to compare images and rank the most similar search results.

Turning features into better search results

Once each image has its combined fingerprint, CTD-Net measures how similar any two fingerprints are. The authors tested several mathematical ways to compare them and found that cosine similarity gave the most reliable matches. In practice, a user submits a query image, CTD-Net converts it into features, then ranks all database images based on how close their feature vectors are. The team evaluated performance on three well-known collections: Corel-1K, Corel-10K, and Caltech-101, which together cover natural scenes, man-made objects, and many different categories and image conditions.

How well the new system performs

Across all three datasets, CTD-Net consistently outperformed systems based only on handcrafted features, only on deep learning, or on simpler hybrids. It reached precision values close to 99 percent on Corel-1K, above 92 percent on Corel-10K, and nearly 89 percent on the more challenging Caltech-101 set. These gains held up even when more results were returned per query and when compared with many recent research methods. Although the hybrid features are larger and take more computation, the authors show that search times remain practical, especially for batch or server-based use where accuracy is crucial.

What this means for everyday image search

For a non-specialist, the message is that smarter image search is becoming more like how humans recognize pictures. By blending straightforward color and texture measurements with deeper learned understanding, CTD-Net can find images that really look and feel similar to a query photo, not just those that share a keyword. This could speed up tasks such as finding similar medical scans, matching artwork or historical photos, or refining product search in online shops. The authors suggest that future work could adapt the same idea to even larger collections and new types of images, making visual search faster, more accurate, and easier to trust.

Citation: Tyagi, S., Shukla, P., Singh, P. et al. Enhanced content-based image retrieval via hybrid color, texture, and deep learning features. Sci Rep 16, 14888 (2026). https://doi.org/10.1038/s41598-026-38422-w

Keywords: content-based image retrieval, image search, deep learning, image features, visual similarity