Clear Sky Science · en

Deep learning-based visual algorithms for identity and action recognition in engineering practical courses

· Back to index

Watching who does what in hands on classes

In many engineering labs, students move around connecting wires, typing code, and checking their phones. For teachers, it is hard to know who is doing which task, and for computers this busy scene is even harder to read. This study introduces an artificial intelligence system that can reliably recognize both student identity and simple actions in a real teaching lab, even when people turn away from the camera or change position.

Figure 1. How an AI system keeps track of moving students and their actions in a busy engineering classroom.
Figure 1. How an AI system keeps track of moving students and their actions in a busy engineering classroom.

Why regular face checks are not enough

Modern face recognition works well when people sit still and look toward a camera, as in a lecture hall or at a security gate. In practical engineering courses, however, students bend over tables, turn their heads, and walk around equipment. Under these conditions, standard face recognition often loses track of people because it relies on clear, frontal views. Person re identification methods that focus on full body appearance have similar problems, since body shape and clothing look very different as students lean, rotate, or move across the room.

Blending faces, bodies, and motion

The authors propose a framework that combines information from both face and upper body images and keeps updating what it knows as the class unfolds. Before class, each student provides a clear front facing photo to the school system. At the start of the session, students log in by looking at the camera while standing in the lab. The system matches their faces to the stored photos and, at the same time, records a body image for each logged in person. These initial face and body features form the starting point for tracking everyone later in the class.

Building a living memory of each student

Once the practical course begins, the system analyzes video at around ten to fifteen frames per second. For every frame, it detects faces and bodies and extracts compact numerical descriptions of each. If a face in the current frame matches the face on file, but the body match is poor, the system assumes the face is trustworthy and adds the new body view to a dynamic body library. In other situations, when a body matches well and its position changes only slightly between frames while the face is briefly missing, the system treats this as a quick head turn and adds the new face view to a dynamic face library. Over time, each student is represented by many examples of their face and body under different angles, scales, and lighting conditions, which makes recognition in later frames much more reliable.

Figure 2. How the algorithm fuses changing face and body views over time to reliably identify students and spot key lab actions.
Figure 2. How the algorithm fuses changing face and body views over time to reliably identify students and spot key lab actions.

Teaching the computer to notice simple actions

Beyond knowing who is in the room, instructors also care about what students are doing. The researchers add a behavior recognition component that focuses on a few key lab activities, such as programming at a laptop, connecting wires, or using a phone. A separate tool draws stick figure like skeletons over human bodies, capturing the arrangement of head, torso, and limbs. The team then trains a lightweight image classifier to distinguish these skeleton based poses. Because this model analyzes simplified outlines instead of full images, it can process more than twenty video frames per second, fast enough to keep up with typical classroom cameras.

Testing the system in a real lab

The framework was evaluated in a servo motor control course with six students working on tasks such as wiring components, resetting a motor to its origin, and writing motion programs. The authors compared three options: face recognition alone, body based re identification alone, and their combined dynamic method. During the login period and throughout the practical session, the combined approach clearly outperformed the other two, achieving higher precision and better overall scores when deciding which student appeared in each video frame. For the action module, recognition accuracy ranged from about two thirds for programming to over four fifths for phone use, despite the use of a relatively small training set.

What this means for future classrooms

For a lay reader, the main message is that the study shows how blending different visual cues and updating them over time can help computers keep track of who is who in a busy teaching lab, while also recognizing a few simple behaviors. The system still struggles with strong side views of the face and with the full variety of student movements, but the authors outline ways to improve it using three dimensional face models and richer training data. They also stress the need for privacy safeguards, such as storing only the most necessary features and encrypting original images. Together, these ideas point toward lab environments where computers quietly support teachers by monitoring participation and activity without interrupting hands on learning.

Citation: Ma, J., Wang, R. & Lan, W. Deep learning-based visual algorithms for identity and action recognition in engineering practical courses. Sci Rep 16, 15524 (2026). https://doi.org/10.1038/s41598-026-45964-6

Keywords: student monitoring, face recognition, action recognition, engineering education, computer vision