Clear Sky Science · en

Nominating Confucian(s) in Ngram Viewer: a DH-CL approach to Confucian identity in Anglophone discourse

· Back to index

Why this study matters for today’s readers

When we look up ideas like Confucianism online, we tend to trust the graphs and search tools that tell us how often words appear in books. This article asks a simple but powerful question: who actually controls these pictures of the past, and what do they quietly suggest about Chinese thought and identity? By tracing the word “Confucian” in millions of English books, the authors show how digital tools can magnify cultural bias while appearing neutral and objective.

How a word became a cultural label

Confucian identity originally referred to Ru, a long tradition of Chinese thinkers, teachers, and institutions. In modern English, however, “Confucian” is often treated as a catchall label for Chinese or East Asian culture. Earlier debates about who counts as a Confucian struggled with vague definitions and scarce data. This study tackles those debates with large-scale digital evidence, arguing that, in practice, “Confucian” has come to work less as a philosophical category and more as an ethnocultural tag applied from the outside.

Using big data to follow the trail of words

The authors combine three approaches: digital humanities, corpus linguistics, and critical discourse analysis. They use Google Books Ngram Viewer, a tool built on the world’s largest digitised collection of books, to see how words linked to “Confucian” have appeared in English publications from 1973 to 2022. They collect 260 nearby words and 214 syntactic partners, then group their meanings using specialist software. This “dual triangulation” allows them to cross-check numerical patterns, language structures, and historical interpretation so that no single method or dataset dominates the story.

Figure 1. Global English books and digital tools shaping how people see Confucian identity over time.
Figure 1. Global English books and digital tools shaping how people see Confucian identity over time.

What the numbers reveal about naming and meaning

The results show that “Confucian” and “Confucians” overwhelmingly dominate other possible English labels for Ru, such as “Ruist” or “Confucianist.” In other words, one Western-made term has effectively set the global standard. Looking at the company that “Confucian” keeps in sentences, the study finds that it clusters heavily with words about nations, dynasties, and time periods, such as “Chinese,” “Song,” “Ming,” “early,” and “neo.” It also appears alongside references to other philosophies and religions like Daoism, Buddhism, and Christianity. Far less frequent are everyday words of belief, ethics, or learning, suggesting that the label is anchored more in geography and history than in ideas or practices.

How distance and otherness are built into the language

Beyond raw frequencies, the study looks at how pronouns and names frame who is speaking and who is being spoken about. In the English books examined, Confucians are usually “they” or “them,” not “we.” References place them in distant times and places, often wrapped in dynastic names or descriptions like “early” and “last.” Even celebrated modern figures such as Liang Shuming are cast as “the last Confucian,” as if the tradition has ended. The authors call this pattern a form of “data orientalism,” where digital systems and search interfaces gently push readers toward seeing Confucianism as an exotic, ancient object rather than a living, self-defined identity.

Figure 2. Step by step analysis of language around Confucian identity revealing a shift toward ethnic and historical framing.
Figure 2. Step by step analysis of language around Confucian identity revealing a shift toward ethnic and historical framing.

Rethinking our digital mirrors of culture

For non-specialists, the central message is that our most trusted digital mirrors of culture, such as Ngram Viewer and Google Books, do not just reflect the world; they help shape it. This study finds that modern Confucian identity, as seen through global English books, is framed mainly as a Chinese or East Asian ethnic and historical label, constructed from the outside rather than by contemporary Confucians themselves. The authors urge readers and researchers to treat big cultural datasets with critical care, to build more balanced corpora, and to pay closer attention to how classics like the Analects are read. In doing so, we can move toward digital tools that illuminate cross-cultural understanding instead of quietly reinforcing old divides.

Citation: Gui, X., Kaur, S. Nominating Confucian(s) in Ngram Viewer: a DH-CL approach to Confucian identity in Anglophone discourse. Humanit Soc Sci Commun 13, 736 (2026). https://doi.org/10.1057/s41599-026-07161-8

Keywords: Confucian identity, digital humanities, Google Books, cultural bias, Anglophone discourse