Clear Sky Science · en
Multimodal interaction enhancement of digital cultural heritage system: user behavior analysis and interface reconstruction of the heritage scanning library of the palace museum
Bringing the Forbidden City to Your Screen
The Palace Museum in Beijing, home to the treasures of the Forbidden City, has poured enormous effort into digitizing its artworks and artifacts. Visitors anywhere in the world can now zoom in on exquisite details that once required a magnifying glass and a special visit. Yet many people still click away after only a few minutes, feeling impressed by the pictures but unsure what they have really learned. This study asks a simple question with big implications: how can a digital museum be redesigned so that ordinary visitors, not just experts, truly feel and understand the culture behind the images?

From Sharp Pictures to Shallow Understanding
The researchers begin by noting a tension at the heart of many digital heritage projects. High‑precision scanning and 3D models capture every crack in a ceramic glaze or every stroke in a scroll painting. But the online systems that display these marvels often treat users as passive viewers. Interaction is mostly limited to rotating, zooming, and scrolling through long, technical descriptions. As a result, rich cultural meanings are buried under specialist terms, and most visitors end up "seeing things without knowing their stories." The Palace Museum’s digital relics library is a prime example: technically impressive, but narratively fragmented and hard to navigate for non‑experts.
Watching Eyes to Understand Minds
To uncover what different visitors actually do on these pages, the team ran eye‑tracking experiments with three groups: professional scholars, history enthusiasts, and general tourists. Participants completed tasks ranging from free exploration to targeted searches and complex operations such as comparing related artifacts. Tiny cameras in special glasses recorded where their eyes landed, how long they lingered, and how their gaze jumped across the screen. At the same time, software logged mouse clicks and scrolling, and after each session the volunteers rated how mentally demanding the tasks felt and sat for in‑depth interviews about what confused or helped them.
Three Ways of Looking at the Same Object
The data revealed three distinct patterns of attention and behavior. Scholars spent most of their time on technical panels listing materials, sizes, and dates, moving in a neat, linear path from main image to data to related items. They completed tasks quickly and reported the lowest mental strain. Enthusiasts, by contrast, kept cycling between the main image and sections that explain historical background and symbolism, using stories to deepen their understanding. Tourists focused heavily on the main 3D image and eye‑catching recommendations, often getting lost in the interface. They misread category labels, stumbled over terms like "gilded," made far more mis‑clicks, and reported feeling overloaded and unsure what to do next. In other words, the same page served experts well, intrigued enthusiasts, and quietly shut out newcomers.

Designing a Museum That Listens and Responds
Drawing on theories of empathy and the "materiality" of media, the authors argue that digital heritage should move from static display toward a more sensory, story‑rich experience. They propose a multimodal redesign centered on the fusion of sight and sound. Visually, pages would gain clear, dynamic guides that highlight important details, show hotspots on motifs like dragons or lacquer textures, and re‑arrange sections based on typical gaze paths. Aurally, each artifact would offer layered audio explanations: expert commentary for scholars, narrative storytelling for enthusiasts, and plain‑language introductions for casual visitors. A voice question‑and‑answer system would let users ask natural questions and receive short, tailored responses, while subtle sound simulations—such as the ring of a bronze bell or the scrape of lacquer work—would evoke the physical presence of the objects.
From Clicking Through to Living the Culture
For a general reader, the takeaway is that a good digital museum is not just a high‑resolution picture gallery. It should feel more like stepping into a guided visit that adapts to who you are and how you explore. By showing how different types of users actually behave on the Palace Museum’s site, this study builds a practical case for redesigning digital heritage systems around human perception rather than technology alone. The authors do not yet implement their full vision, but they outline a clear roadmap: use real behavior data to drive more intuitive visuals, richer sound, and layered storytelling. If realized, this approach could turn quick, shallow browsing into immersive journeys where people not only admire ancient objects but also connect with the lives, skills, and values they embody.
Citation: Ke, L., Qin, H., Long, J. et al. Multimodal interaction enhancement of digital cultural heritage system: user behavior analysis and interface reconstruction of the heritage scanning library of the palace museum. Sci Rep 16, 10654 (2026). https://doi.org/10.1038/s41598-026-44955-x
Keywords: digital cultural heritage, Palace Museum, museum interface design, multimodal interaction, user experience