Clear Sky Science · en

Application of representation learning in detecting botnet attacks

2026-03-04 · Back to index

Why hidden cyber armies matter to everyone

Behind everyday internet use, from streaming movies to checking bank balances, silent armies of hacked machines—called botnets—can be marshalled to flood websites, spread scams, or steal data. Spotting these botnets early is hard, especially when attackers constantly change their tactics. This paper presents a new way to "see" suspicious activity in network traffic by turning raw connection data into compact images that a deep learning model can understand, greatly improving the chances of catching new, previously unseen botnet attacks.

The growing problem of quiet online takeovers

Botnets are networks of ordinary devices—laptops, servers, even smart home gadgets—that have been secretly taken over and can be remotely controlled as a single weapon. They can overwhelm online services with junk traffic, send waves of spam and phishing emails, or quietly siphon off personal and financial information. As the number of internet-connected devices explodes, so does the potential size and power of these hidden networks. Traditional defenses rely on known attack “signatures” or simple statistical rules, which work only as long as attackers don’t change their behavior too much. Once a new botnet family or clever disguise appears, these older systems often fail to recognize the threat.

Limits of today’s smart security tools

In recent years, researchers have turned to machine learning and deep learning to automatically spot suspicious patterns in network traffic. Many systems use hand-crafted summaries of connections—such as average packet sizes or connection durations—as input to traditional models like decision trees or random forests. While these methods can work well on the data they were tuned for, they depend heavily on the choice of human-designed features. When a new botnet behaves differently, the old feature set may no longer capture what makes it dangerous. Deep learning has improved matters by learning patterns directly from data, but most approaches still treat network traffic as simple tables of numbers, potentially throwing away subtle relationships that could distinguish a new attack from ordinary activity.

Turning raw traffic into pictures a neural net can read

This study introduces an end-to-end framework that reframes botnet detection as an image-recognition problem. Each network flow—a record that summarizes who talked to whom, how long, and with how much data—is first carefully encoded. IP addresses are split into their four numerical parts, ports and protocols are represented by how often they occur, and numerical values such as duration and total bytes are scaled to a common range. From these 19 numbers, the method builds a tiny grayscale image using a Hilbert space-filling curve, a winding path that maps the one-dimensional list of values onto a two-dimensional grid while keeping nearby values close together. Even though most pixels are empty, the nonzero ones form small, consistent shapes that a convolutional neural network can learn to recognize as signatures of normal or malicious behavior.

Stress-testing the system against brand‑new threats

To see whether this image-based approach truly generalizes, the author uses a realistic benchmark dataset of network traffic, CTU-13, which contains multiple recorded botnet infections mixed with normal activity. The deep learning model is trained only on one botnet family, called Murlo, and then tested on a completely different family, Rbot, that it has never seen before. This setup mimics a real-world "zero‑day" situation, where a defender must flag a new attack pattern on the fly. The proposed system, based on a ResNet-18 image classifier working on compact 32×32 images, correctly identifies botnet flows with about 98% overall accuracy and a similarly high F1-score, while keeping both missed attacks and false alarms low. In sharp contrast, a strong traditional baseline—Random Forest trained on the same scenario—achieves decent overall accuracy but almost completely fails to recognize the new botnet, misclassifying virtually all malicious traffic as harmless.

What this means for safer networks

The results show that how network data are represented matters as much as which model is used. By organizing connection features into small, locality-preserving images, the system captures the underlying "shape" of malicious behavior rather than memorizing specific numbers tied to one known botnet. This allows it to spot related but different attacks with far greater reliability. Because the method uses metadata and flow statistics instead of looking inside packet contents, it is well suited to today’s world of encrypted communications and sprawling Internet-of-Things devices. In practical terms, this work points toward intrusion detection systems that can adapt to new botnet families with less manual tuning, offering a more resilient line of defense for everyday users and organizations alike.

Citation: Le Ngoc, H. Application of representation learning in detecting botnet attacks. Sci Rep 16, 11977 (2026). https://doi.org/10.1038/s41598-026-40172-8

Keywords: botnet detection, network security, deep learning, representation learning, intrusion detection