Clear Sky Science · en

Mitigating spurious features by contrastive learning in pottery sherd recognition

· Back to index

Why broken pots matter to modern science

At first glance, piles of broken pottery from a 7,000-year-old village in southern China seem far removed from modern artificial intelligence. Yet these fragments are a key to understanding how Neolithic people lived—and they also expose a hidden weakness in today’s image-recognition systems. This study uses advanced machine learning to sort ancient Hemudu pottery sherds into types, while tackling a problem that affects many AI systems: the tendency to latch onto misleading visual “shortcuts” instead of the truly meaningful clues.

Figure 1
Figure 1.

Ancient pots and their hidden stories

The Hemudu archaeological site has yielded around 400,000 pottery fragments, a treasure trove for reconstructing daily life, technology, and trade in Neolithic southern China. Two main kinds of pottery dominate the site. Sand-tempered pottery is packed with sand and gravel, making it dense, hard, and resistant to heat. Charcoal-tempered pottery mixes in burned plant material, leaving tiny pores and ash-like traces that make the vessels lighter and smoother. Archaeologists classify these types mainly by their surface texture and material makeup, not by the irregular outline of each broken piece. Automating this classification could save huge amounts of expert time, but only if the computer focuses on the same clues that specialists trust.

When AI learns the wrong lesson

The researchers built a carefully controlled image collection at the excavation site, photographing 1,864 sherds in a light-proof tent with constant lighting and white backgrounds. Surprisingly, early experiments revealed that a standard deep network could classify the pottery quite well using only the shapes of the fragments, achieving high accuracy on binarized, outline-only images. By contrast, when the researchers cropped away edges and kept only the inner surface texture, accuracy dropped. This meant the model had discovered an easy but untrustworthy shortcut: the specific broken shapes, which archaeologists view as random accidents of breakage, not reliable markers of pottery type. In machine-learning terms, fragment shape was acting as a “spurious feature”—a pattern that correlates with the label in the dataset but is not truly linked to the underlying category.

Teaching the model to look past the shortcut

To push the system toward more meaningful cues, the team designed a training strategy based on contrastive learning, a technique that teaches a model which images should be considered “similar” or “different.” For each pottery photo, they created a version that was randomly cropped so that much of the outline vanished while the internal surface remained. Both images were passed through the same feature-extracting network, and the training process forced their internal representations to move closer together. At the same time, images from different pottery types were pushed farther apart in this feature space. A specialized “Triplet-center” loss function tightened clusters of sherds from the same class and separated the clusters of sand-tempered and charcoal-tempered pieces, even when their textures looked quite alike to the naked eye.

Figure 2
Figure 2.

Making learning more stable and reliable

After shaping this feature space, the researchers froze it and trained a simple classifier on top. To avoid the familiar pitfall of overfitting—doing extremely well on training data but faltering on new samples—they used a technique called flooding. Instead of driving the training error all the way to zero, flooding deliberately holds the loss at a small, non-zero level, encouraging the model to settle into a broad, flat region of solutions that tends to generalize better. They also tested many common data-augmentation tricks, such as color changes and blurring. Alterations that disturbed texture information generally hurt performance, while those that disrupted shape—like horizontal flips and carefully tuned random crops—helped the model ignore the misleading outline cues.

What this means for archaeology and AI

With this combination of contrastive training, Triplet-center loss, and flooding, the system reached 97.3% accuracy on the Hemudu pottery dataset, beating several well-known image-recognition models. The method also improved performance on a separate benchmark where object types appear in new, unfamiliar backgrounds, suggesting it can help many vision systems resist spurious correlations. For archaeologists, such tools promise faster, more consistent sorting of vast sherd collections, freeing experts to focus on interpretation instead of repetitive labeling. For a lay reader, the takeaway is clear: by forcing AI to look past convenient but unreliable shortcuts—like the jagged outline of a broken pot—we can build systems that see the world in ways that are closer to how human experts understand it.

Citation: Yu, X., Li, T., Song, Z. et al. Mitigating spurious features by contrastive learning in pottery sherd recognition. npj Herit. Sci. 14, 135 (2026). https://doi.org/10.1038/s40494-025-02170-3

Keywords: Hemudu pottery, contrastive learning, spurious correlations, archaeological imaging, image classification