Clear Sky Science · en

Ancient architecture image classification with progressive stacking pseudoinverse learning

· Back to index

Why old buildings meet modern algorithms

Across China, temples and palaces with sweeping roofs and intricate wooden brackets are being photographed in huge numbers. Archivists and conservationists need to sort these images quickly, but doing this by eye is slow and subjective. This paper presents a new way to teach computers to recognize and classify photos of ancient buildings more accurately and efficiently, helping protect cultural heritage in the digital age.

Figure 1
Figure 1.

What makes these buildings hard to tell apart

Ancient Chinese architecture is rich in repeating patterns: curved rooflines, layered bracket sets under the eaves, carved beams, and colorful surface decoration. Many buildings share similar layouts, differing only in subtle shifts of roof curve or bracket form. Standard image-recognition systems, which learn by gradually adjusting internal weights, can be thrown off by these fine-grained differences and by distracting cues such as wall color or lighting. They also tend to overfit to one region or style when trained all at once on a large batch of images, reducing their ability to generalize to buildings from other sites.

A smarter way to look at key details

The authors introduce a framework called ancient architecture image classification with progressive stacking pseudoinverse learning (AAPSP). At its heart is a module dubbed key features stacking pseudoinverse learning (KFSP). Instead of starting from completely random settings, KFSP builds several parallel “base learners,” each initialized with weight patterns designed to match particular visual traits. Two branches are tuned to be especially sensitive to smooth, continuous structures such as roof outlines, while a third is tuned to pick up more scattered textures such as decorative motifs. A mathematical shortcut known as pseudoinverse learning allows these branches to be trained in essentially one shot, avoiding the slow, step-by-step weight updates of traditional deep learning.

Letting the model pay attention where it matters

Simply having multiple branches is not enough; the system must also decide which branch is most helpful for each decision. To do this, KFSP uses an attention mechanism that measures how closely each branch’s output matches the true building labels. Branches that better capture telltale elements—such as the shape of a bucket arch or the outline of a ridge ornament—are automatically given more influence when their outputs are combined. This stacked representation forms a feature space that more closely follows the underlying “logic of shape” in ancient architecture, so that buildings with similar structural components cluster together and those with different styles separate more clearly.

Figure 2
Figure 2.

Learning from the most informative photos

The second core module, progressive optimization learning (POL), tackles a different problem: redundant training images. Many photos in the dataset show nearly identical views of the same facade, offering little new information. POL begins by splitting the data into an initial training set and a larger candidate pool. Using ideas from active learning, it analyzes how confidently the current model classifies each candidate image and how unusual its features appear. Photos that are both uncertain and distinctive—such as rare bracket arrangements or unusual roof combinations—are gradually moved into the training set. This cycle repeats, steadily enriching the training data with challenging and diverse examples without increasing the total number of images used.

How well does it work in practice

The authors tested their approach on a public collection of 2,269 images from six famous temples and palaces. After applying KFSP alone, the system already outperformed a comparable method that relied on fully random projections. When POL’s progressive sample selection was added, classification accuracy improved further, and measures of precision, recall, and F1 score all rose. In other words, the model became both more reliable in its correct guesses and better at finding less common categories. The study also highlighted a remaining difficulty: classes with very few images still pose a challenge, because even a smart learner struggles when there is too little variety to learn from.

Why this matters for cultural heritage

By carefully steering both what the model pays attention to and which images it learns from, AAPSP offers a more precise tool for sorting and studying photos of historic buildings. For heritage professionals, this means faster creation of digital archives, better support for dating and comparing architectural styles, and more robust monitoring of sites spread across different regions. While the method is tailored to Chinese ancient architecture, its core ideas—highlighting key structural details and progressively focusing on rare but informative examples—could be adapted to other kinds of cultural objects, from sculptures to historic streetscapes.

Citation: Cai, Z., Sun, X., Zhang, S. et al. Ancient architecture image classification with progressive stacking pseudoinverse learning. Sci Rep 16, 14626 (2026). https://doi.org/10.1038/s41598-026-44876-9

Keywords: ancient architecture, image classification, cultural heritage, machine learning, active learning