Clear Sky Science · en

Three-stage progressive framework for Dongba ancient texts inpainting

· Back to index

Why Saving Ancient Picture-Writing Matters

For the Naxi people of southwest China, Dongba books are a doorway into a thousand years of stories, rituals, and daily life. These books are written in a rare picture-like script that blends images and words. Time, humidity, and handling have damaged many pages, leaving holes and missing strokes that make the symbols hard to read or even recognize. This study introduces a new digital method to "fill in the gaps" of these fragile texts, aiming to restore both how the writing looks and what it means, and offering a powerful new tool for cultural preservation.

From Broken Pages to Digital Restoration

Conservators have long tried to repair damaged manuscripts physically, but today digital restoration offers an additional path: instead of touching the original, computers can reconstruct missing parts in a scanned image. For ordinary printed text, modern algorithms already do a decent job of guessing lost letters from surrounding shapes and patterns. Dongba books pose a tougher challenge. Each symbol is a small drawing whose lines carry both visual style and meaning. If the software simply completes lines to look smooth, it may accidentally change the symbol into something that never existed, distorting the cultural record. The authors argue that any serious restoration must respect both the artwork-like appearance and the strict rules of the writing system.

Figure 1
Figure 1.

A Three-Step Journey from Outline to Meaning

The research team proposes a three-stage progressive framework, called TsP, designed specifically for heavily damaged Dongba pages. In the first stage, the system focuses only on outlines. It takes the damaged image, detects where strokes once were, and uses a hybrid of two powerful techniques—convolutional networks, which are good at local details, and Transformer networks, which are good at global structure—to roughly rebuild the missing edges. The result is an approximate contour map, like a sketch that hints at the character’s overall shape even where parts are missing.

Letting a Digital Dictionary Guide the Repair

In the second stage, the system brings in knowledge about Dongba itself. The researchers built a digital dictionary of commonly used Dongba symbols, including many handwriting styles for each one. The algorithm compares the repaired outline from stage one to all entries in this dictionary and finds the most similar complete character. It does this not by reading text labels, but by measuring how closely the shapes match in a statistical sense. The chosen symbol serves as a “content prior” — a best guess of what the missing character is supposed to be, providing both semantic clues and fine stroke details that a purely visual method would miss.

Polishing the Final Image

In the third and final stage, TsP combines two streams of information: the structural outline from the first step and the full character from the dictionary. A specially designed dual-branch neural network extracts features from both sources, one branch focusing on stroke layout and another on the richer content patterns. These features then guide a restoration module that works not only in image space but also in the frequency domain, where patterns like overall smoothness and rhythm of strokes can be adjusted more effectively. This final pass cleans up artifacts, adds missing parts of strokes, and smooths transitions between old and newly generated regions so that the repaired character blends naturally into the original page.

Figure 2
Figure 2.

How Well Does It Work?

To test their approach, the authors used DB1404, the only large public dataset of Dongba characters, which includes thousands of symbols captured in many styles. They created digital “damage” of varying severity, masking from just a small portion of each image up to half of it, using irregular holes and scratches that mimic real deterioration. TsP was compared with leading image-repair methods, including classic tools, modern Transformer-based systems, and diffusion models. Across all levels of damage, TsP produced images that were both visually more convincing and structurally closer to the original characters, especially when large portions were missing—exactly the situation that is most critical for rare and fragile manuscripts.

What This Means for Ancient Writing

In plain terms, this work shows that computers can learn not just to smooth over cracks in an image, but to respect the rules and meanings of an ancient writing system while doing so. By first guessing the skeleton of a damaged character, then matching it to a known symbol, and finally using both as guidance for careful inpainting, TsP better preserves the original form and sense of Dongba script. Beyond a technical achievement, this approach could help librarians, historians, and local communities recover the contents of manuscripts that might otherwise remain unreadable, and it provides a template for restoring other endangered scripts around the world.

Citation: Bi, X., Shi, Q. & Chen, Z. Three-stage progressive framework for Dongba ancient texts inpainting. npj Herit. Sci. 14, 240 (2026). https://doi.org/10.1038/s40494-026-02524-5

Keywords: Dongba manuscripts, ancient script restoration, image inpainting, cultural heritage digitization, deep learning