Clear Sky Science · en

An innovative framework for secure data transmission using machine learning based classification and ElGamal encryption with Ramanujan primes

· Back to index

Why protecting everyday messages matters

Every day, banks, hospitals, and governments send short digital messages that can range from routine news alerts to highly sensitive account updates. Treating all of these messages as if they were equally secret wastes computing power, but being too relaxed can expose people to fraud and privacy breaches. This paper explores a way to automatically sort messages by how sensitive they are and then protect them with matching levels of encryption, aiming to balance safety with speed and resource use.

Sorting harmless notes from critical alerts

To begin, the authors build a simple text-classification system that separates ordinary messages, such as general news headlines, from highly sensitive ones, such as banking notifications and transaction alerts. They create a small dataset of 200 short, carefully written sentences, half financial and half general news, and clean the text by removing punctuation, numbers, and common stop words. Each message is turned into a numerical fingerprint using a standard technique that emphasizes words that are frequent in one message but rare overall. Several popular machine learning methods are tested, including K-Nearest Neighbors, Support Vector Machines, Linear Discriminant Analysis, and K-means clustering. Using five-fold cross-validation to avoid overfitting, the Support Vector Machine model delivers the most accurate and stable performance, making it the preferred tool for deciding whether a message is merely routine or truly sensitive.

Two encryption routes for two types of data

Once messages are labeled, they travel down one of two encryption paths. Ordinary sensitive messages are protected using the standard ElGamal public-key scheme, a well-known method that relies on the difficulty of a mathematical puzzle called the discrete logarithm problem. Highly sensitive messages follow a modified route that is identical in how it scrambles and unscrambles data but differs in how it chooses one of the crucial secret numbers, known as the prime modulus. Here, the authors experiment with a special family of prime numbers called Ramanujan primes, which have interesting spacing properties among the primes. Importantly, the authors stress that this choice does not make the underlying mathematics harder to break; instead, it offers a structured and novel way to generate keys without changing the proven security foundations of ElGamal.

Figure 1
Figure 1.

Checking that nothing is tampered with

Encryption alone does not guarantee that a message has not been altered in transit. To add this protection, the framework attaches a hash-based message authentication code (HMAC) to every encrypted message before it is sent. This mechanism uses a shared secret and a one-way hash function to produce a compact tag that changes if even a single bit of the message is modified. On the receiver side, the same secret and hash are used to recompute the tag and compare it with the one that was sent; only if they match is the message accepted as authentic. The authors implement all steps—classification, key generation, encryption, decryption, and HMAC—within a single Python program and evaluate how long each operation takes and how much data can be processed per unit of time.

What the timing results reveal

Performance tests compare the treatment of normally sensitive and highly sensitive messages, both with and without the extra HMAC step. As expected, adding authentication increases the processing time for all messages. When Ramanujan primes are used for the highly sensitive route, the encryption and decryption of those messages exhibit lower average data rate and throughput than the ordinary route, meaning the system handles fewer kilobytes per millisecond and each bit of data takes longer to process. From a lay perspective, the framework is deliberately spending more time and computing effort on the most sensitive traffic, while the less critical messages move through more quickly. At the same time, the authors note that this extra overhead for critical data translates into lower memory usage per unit of data, which may help keep resource demands manageable on busy servers.

Figure 2
Figure 2.

What this work means for secure communication

In simple terms, the study shows that it is possible to design a security system that automatically gauges how sensitive a message is, then routes it through a matching level of protection, all while preserving the core safety guarantees of a trusted encryption method. The use of Ramanujan primes adds a mathematically novel twist to the way secret keys are chosen, without claiming to strengthen the security beyond that of standard ElGamal. Though the text classifier is only a proof of concept built on a small, carefully curated dataset, the overall architecture points toward future systems in which everyday messages, financial alerts, and medical updates can be handled differently yet consistently, conserving computing resources without compromising the privacy and integrity of the information people care about most.

Citation: Haritha, N., Narayanan, V. & Srikanth, R. An innovative framework for secure data transmission using machine learning based classification and ElGamal encryption with Ramanujan primes. Sci Rep 16, 11090 (2026). https://doi.org/10.1038/s41598-026-40797-9

Keywords: secure data transmission, text classification, public key encryption, Ramanujan primes, HMAC authentication