Clear Sky Science · en
Collective and augmented intelligence outperform artificial intelligence on emotion recognition tests
Why this matters for everyday life
Who is better at reading emotions from a glance at someone’s eyes: people or machines? As artificial intelligence systems move into schools, clinics, and workplaces, many tools promise to judge moods and mental states from faces. This study shows that while a powerful AI model can beat most individual people on lab-style emotion tests, groups of people working independently still come out ahead, and the best results of all appear when human and machine judgments are combined.

How the tests of emotion reading work
The researchers focused on two widely used lab tasks that ask people to infer feelings and thoughts just from photographs of the eye region. In each test, viewers see an image and must pick which of four short words best matches the person’s mental state. One test uses black and white photos drawn mainly from a single ethnic group, while the newer version includes color images of people from more diverse backgrounds and uses simpler vocabulary. Decades of research link scores on these tests to social skills and clinical outcomes, even though they are not perfect mirrors of real-world emotional life.
How a leading AI compares to individual people
The team evaluated a strong multimodal language model called GPT-5 mini, which can analyze images and text. They ran the model 100 times on every test item, without giving any practice examples, to capture its baseline performance. Compared with data from more than 27,000 human participants, GPT-5 mini answered correctly about 83 percent of the time on both tests, clearly above the human averages of 71 and 63 percent. Detailed analyses across the whole range of human ability showed that the AI outperformed nearly all low and mid scoring people. On the older test, however, the very best human scorers slightly matched or edged out the model, while on the newer multiracial test the AI kept its lead even at the top end.
Why crowds of people beat crowds of machines
Next, the researchers asked what happens when many separate answers are pooled. They simulated crowds by repeatedly sampling sets of people, or sets of AI runs, and letting the most common answer win, a simple rule called plurality voting. Human crowds improved sharply with size; when 100 people’s answers were combined, accuracy on one test approached perfection. In contrast, AI crowds gained little from adding more runs. Different calls to the same model tended to repeat the same mistakes, so the group could not correct its own errors. In effect, this was like asking the same expert the same question many times, rather than drawing on varied life experiences.
Humans and AI together work best
The final step was to mix human and AI votes. The researchers built hybrid crowds where most members were people and a smaller share were AI runs, with each side contributing answers independently before they were combined. These augmented groups consistently outperformed both human-only and AI-only crowds. On the newer, more inclusive test, neither humans nor AI alone could get beyond about 95 percent accuracy, but the mixed groups reached roughly 98 percent, and they did so with smaller crowd sizes. This pattern suggests that people and machines tend to make different kinds of mistakes, so their strengths naturally complement one another.

What this means for using emotion AI
The study concludes that comparing AI to an “average human” can be misleading, because it ignores the power of collective human judgment. A strong model like GPT-5 mini may outperform most individuals on narrow lab tests, yet still fall short of what diverse groups of people can achieve together, especially when machines simply repeat the same errors. The most reliable approach for tasks like reading emotions from faces is not to let AI replace people, but to pair human insight with machine consistency in carefully designed systems that keep humans in the loop.
Citation: Akben, M., Gude, V. & Ajjan, H. Collective and augmented intelligence outperform artificial intelligence on emotion recognition tests. Sci Rep 16, 14823 (2026). https://doi.org/10.1038/s41598-026-45331-5
Keywords: emotion recognition, collective intelligence, human AI collaboration, multimodal AI, social cognition