Clear Sky Science · en

Feature importance guided autoencoder for dimensionality reduction in intrusion detection systems

2026-02-04 · Back to index

Why smarter cyber defenses matter

Every email you send, video you stream, and purchase you make travels across networks that are constantly under attack. Intrusion Detection Systems (IDS) act like alarm systems for these networks, spotting suspicious behavior before it turns into a breach. But modern network data are huge and complex, and sifting through all those details can bog systems down or cause them to miss subtle attacks. This paper explores a new way to shrink that data intelligently so IDS tools become both faster and better at catching even rare, hard-to-spot cyberattacks.

The problem with too much network data

Network traffic records contain dozens to hundreds of measurements for every connection—such as duration, number of bytes, and error rates. Machine-learning based IDS models rely on these measurements to decide whether traffic is normal or malicious. However, using all of them can slow detection and sometimes even hurt accuracy, especially when some attacks are much rarer than others. Common dimensionality reduction methods, like Principal Component Analysis and standard autoencoders, compress the data but mainly focus on reconstructing the overall traffic. That means they may pay more attention to the majority of everyday connections and overlook the faint, distinctive patterns that mark minority attack types.

A new way to rank what really matters

The authors introduce a feature ranking scheme called one-versus-all (OVA) feature importance to address this imbalance. Instead of asking, “Which measurements are most useful overall?”, OVA asks that question separately for each attack type. For every class (for example, normal traffic, denial-of-service, or password-guessing), a random forest model is trained to distinguish that class from all others. The model’s built-in importance scores then reveal which measurements are especially helpful for that specific class. By repeating this process class by class and then taking, for each measurement, the highest importance it achieves for any class, the method builds a single weight vector that highlights features that matter for at least one kind of attack—even if that attack is rare in the data.

Teaching an autoencoder to focus on key signals

To make use of these weights, the researchers design a feature importance-based autoencoder (FI-AE). Like a conventional autoencoder, FI-AE compresses the input into a low-dimensional “bottleneck” representation and then reconstructs the original data. The twist is in the training objective: instead of treating all reconstruction errors equally, the model uses a weighted mean squared error that multiplies each feature’s error by its OVA-based importance. In simple terms, FI-AE is punished more for misrepresenting measurements that are crucial for telling attacks apart, and less for less informative details. The architecture itself is compact, squeezing network records down to just 16 numbers while using standard techniques such as batch normalization, dropout, and the Adam optimizer to keep training stable.

Putting the method to the test

The team evaluates FI-AE on three widely used intrusion detection datasets: NSL-KDD, UNSW-NB15, and CIC-IDS2017, which together cover millions of connections and a wide range of attack types. Before training, they tidy the data by balancing extremely skewed class distributions, scaling numeric features, and encoding categories in a way that preserves their relationship to the target labels. They then compare three pipelines that all end with a random forest classifier: one using PCA, one using a standard autoencoder, and one using FI-AE for dimensionality reduction. Across all three datasets, FI-AE consistently delivers higher accuracy and F1-scores, with particularly noticeable gains on minority and rare attacks where traditional methods tend to struggle.

What this means for everyday security

For non-specialists, the key message is that this work offers a more discerning lens for network monitoring. Instead of simply compressing data to make it smaller, FI-AE learns to preserve the measurements that truly matter for spotting different types of attacks, including the rare ones that can be the most damaging. With just 16 distilled features, intrusion detection systems built on this approach can run more efficiently while still achieving or surpassing state-of-the-art detection accuracy. In practice, that means security tools can scan more traffic, react more quickly, and provide better protection for the digital services people rely on every day.

Citation: Abdel-Rahman, M.A., Alluhaidan, A.S., El-Rahman, S.A. et al. Feature importance guided autoencoder for dimensionality reduction in intrusion detection systems. Sci Rep 16, 5013 (2026). https://doi.org/10.1038/s41598-026-36695-9

Keywords: intrusion detection, network security, dimensionality reduction, autoencoder, feature importance