Clear Sky Science · en

Human versus artificial intelligence: investigating ability of young academics from research and non-research institutions to identify ChatGPT-generated dental research abstracts

2026-03-05 · Back to index

Why this study matters to everyday readers

As tools like ChatGPT rapidly enter classrooms and research labs, many people are asking a simple question: can we actually tell when a computer has written something that looks scientific? This study zooms in on that problem in a very practical setting—dental research—and tests whether young university teachers can spot AI-written research summaries, and how their skills compare with specialized AI-detection software.

Putting people and machines to the test

The researchers focused on a very specific and important slice of scientific writing: the abstract, the short summary at the start of a research paper that most readers see first. They collected 75 real abstracts from leading dental journals and then asked ChatGPT to write 75 new abstracts using the same titles. That produced a pool of 150 texts—half human-written, half AI-generated—that looked like genuine research summaries but varied in origin in ways the reviewers could not see.

Young academics in the hot seat

Six early-career dental academics, all with less than two years of teaching and research experience, were recruited from six universities in Malaysia—three government research universities and three private non-research institutions. Each person received a mix of real and AI-written abstracts, stripped of any journal names or author details so that only the wording remained. They were asked to decide whether each abstract was human- or AI-written, and to grade its quality using a simple scoring sheet that rated clarity, flow, creativity, depth of understanding, grammar, use of technical language, and field-specific knowledge.

How software judges the same texts

The same 150 abstracts were then evaluated by three different AI-output detectors and a widely used similarity checker. The AI detectors estimate how likely it is that a text came from a system like ChatGPT, while the similarity checker (Turnitin) compares the text against huge databases of existing writing to see how closely it matches. Together, these tools represent the kinds of digital safeguards that universities are starting to rely on to protect academic integrity as AI-assisted writing becomes more common.

Who did better, humans or machines?

The young academics struggled more than they might have expected. Their success in identifying whether an abstract was human- or AI-generated ranged from 44% to 76%—not much better than a careful guess in some cases. Reviewers from research-intensive universities did not clearly outperform those from teaching-focused private universities; individual differences mattered more than the type of institution. Interestingly, when grading quality, the reviewers tended to rate real abstracts as good to excellent and AI abstracts mostly as average, suggesting they could sense differences in depth and nuance even when they misjudged who wrote the text.

Detectors that outperformed their human users

The software, especially one tool called GPTZero, proved more reliable at telling human and AI writing apart. GPTZero correctly classified about nine out of ten abstracts, far better than the human reviewers and better than the two other AI detectors tested. The similarity checker also performed strongly: almost all real abstracts showed very high similarity to existing sources (as they were actual published work), while AI-generated abstracts tended to have low to moderate similarity, reflecting ChatGPT’s ability to rephrase rather than copy. Together, these tools showed that automated detection can currently outpace unassisted human judgment, at least for early-career academics reading technical texts.

What this means for education and research

For non-specialists, the key message is that even trained young academics find it hard to reliably spot polished AI-written research summaries just by reading them, and their institutional setting—research-heavy or not—does not guarantee sharper instincts. At the same time, some detection tools already do a surprisingly good job, although they are not perfect and can change in accuracy as AI systems evolve. The authors conclude that universities should not rely on human judgment alone, nor on any single detector. Instead, they argue for a combined approach: better training in AI literacy for early-career staff, thoughtful use of multiple detection tools, and clear ethical guidelines so that human expertise and artificial intelligence work together to protect the trustworthiness of scientific writing.

Citation: AL-Rawas, M., Abdul Qader, O.A.J., Lin, G.S.S. et al. Human versus artificial intelligence: investigating ability of young academics from research and non-research institutions to identify ChatGPT-generated dental research abstracts. Sci Rep 16, 12287 (2026). https://doi.org/10.1038/s41598-026-42555-3

Keywords: ChatGPT, academic integrity, AI detection, dental research, early-career academics