Clear Sky Science · en

Dynamic Kannada Sign Language Recognition on Resource Constrained Devices

2026-02-26 · Back to index

Bridging the Conversation Gap

For many Deaf people in Karnataka, everyday conversations depend on Kannada Sign Language (KSL). Yet most phones and apps only understand spoken and written languages, leaving KSL users without the digital tools others take for granted. This study tackles that gap by building a system that can read short KSL signs from video and run efficiently on ordinary smartphones, opening the door to faster, more private communication between signers and non-signers.

Building a Real-World Sign Library

Because no public video database of KSL words existed, the researchers began by creating one from scratch. They worked with teachers at a school for Deaf children and 38 volunteers from across Karnataka to record more than two thousand videos of KSL signs. The team focused on 33 everyday words grouped into four themes: fruits, months, days of the week, and times of day or seasons. Each word was filmed many times, at different speeds, in different locations, and under varied lighting. This variety helps the system cope with the messy, unpredictable conditions of real life rather than only working in a perfect lab setting.

Teaching Computers to See Motion

Instead of feeding full video images into a heavy vision model, the system first reduces each frame to a set of key points representing the signer’s upper body and hands. Using Google’s MediaPipe Holistic toolkit, the researchers track 59 landmarks—such as shoulder, elbow, wrist, and finger joints—and record their 3D positions over time. This produces a compact “skeleton” of each gesture sequence: 75 frames per video, each with 177 numeric features. To strengthen the system against noise, they expand the dataset with careful video augmentations, adding small camera tilts, lighting changes, artificial speckles, motion speed-ups and slow-downs. These steps help the models learn the essence of a sign rather than memorizing a specific background or recording condition.

Three Ways to Read a Moving Sign

With this cleaner representation of movement, the team compares three deep learning approaches for recognizing each signed word. The first is an LSTM, a network designed to follow sequences frame by frame, remembering important details while forgetting distractions. The second, a BiLSTM, looks at the gesture from both past-to-future and future-to-past, giving it a richer view of the motion. The third is an encoder-only Transformer, which examines all frames in relation to one another using an attention mechanism: instead of scanning in strict order, it learns which moments in the sign depend most on each other. All three models see the same data split into training, validation, and test sets, and are tuned to classify the 33 words from the motion patterns alone.

Shrinking Powerful Models for Tiny Devices

High-accuracy models are often too large and slow for resource-limited devices like mid-range phones. To solve this, the authors apply TinyML-style optimizations using TensorFlow Lite. They convert each trained model into smaller versions by reducing the numerical precision of the internal weights—a process known as post-training quantization. Several schemes are tried, including dynamic range, float16, and full-integer variants. These trimmed models are then embedded in a Flutter-based Android app. Because there is not yet built-in support to run MediaPipe Holistic directly on the phone within Flutter, an external, lightweight server extracts the keypoints and sends only the compact motion data back to the app, which performs the final recognition on-device.

Fast, Accurate Sign Reading in Your Hand

Despite being pared down for speed and size, the best models retain impressive performance: around 94–96% test accuracy on the 33 KSL words. The dynamically quantized BiLSTM reaches the highest accuracy at 95.71%, while the quantized Transformer model offers the fastest on-phone predictions—about 16 milliseconds per sign—with a model size just over 1 MB. The LSTM strikes a middle ground between size, speed, and accuracy. All three run with modest CPU and memory use, suggesting that real-time KSL recognition can be practical even on everyday smartphones without constant internet access or expensive hardware.

What This Means for Everyday Life

In plain terms, this work shows that it is possible to give a regular smartphone the ability to “understand” a core set of KSL words from short videos, reliably and quickly. By crafting a dedicated KSL video dataset, distilling gestures down to body and hand skeletons, and compressing modern sequence models to run efficiently on the edge, the researchers provide a blueprint for accessible sign recognition technology tailored to a regional language. While the current system handles only 33 isolated words and still relies on a small server for feature extraction, it marks a concrete step toward richer, fully on-device tools that could help hundreds of thousands of KSL users communicate more smoothly with the hearing world.

Citation: V, U., K S, N., K S, N. et al. Dynamic Kannada Sign Language Recognition on Resource Constrained Devices. Sci Rep 16, 11186 (2026). https://doi.org/10.1038/s41598-026-40181-7

Keywords: Kannada sign language, mobile sign recognition, TinyML, gesture recognition, assistive technology