Clear Sky Science · en
Advancing APT detection through transformer-driven feature learning and synthetic data generation
Why Hidden Cyberattacks Matter
Modern organizations depend on computer networks that hum with constant activity, from web browsing to critical government services. Buried in this digital noise, however, are some of the most dangerous cyber threats: advanced persistent threats (APTs). These long‑running, stealthy attacks are often backed by highly skilled groups and can quietly burrow into systems for months. The paper introduces a new method, called ET‑SDG, that uses recent advances in artificial intelligence to sift through vast streams of network traffic, learn what truly suspicious behavior looks like, and spot rare but severe APT activity more reliably than previous tools.
The Challenge of Finding a Needle in a Digital Haystack
APT campaigns differ from everyday malware because they are slow, adaptive, and carefully targeted. They use tricks such as exploiting unknown software flaws and hiding their communications inside normal‑looking traffic. Traditional intrusion detection systems rely on fixed rules or known signatures, which means new or modified attacks can slip through. Recent research has turned to machine learning to hunt for subtle patterns in network “flows” — summaries of who talked to whom, for how long, and how much data was exchanged. But two problems remain: the patterns inside these flows are complicated, and real‑world data is heavily unbalanced, with far more normal traffic than confirmed APT attacks. This imbalance can cause AI systems to become excellent at recognizing normal behavior while quietly overlooking the rare events that matter most.

A Smarter Way to Read Network Flows
The ET‑SDG framework tackles the first problem — understanding complex traffic — by breaking the job into stages. It starts with dozens of numerical descriptors for each network flow. A method known as ExtraTrees acts like a fast, rough reviewer: it compares many possible decision trees to work out which features help most in telling attack traffic from normal traffic, and discards the rest. The trimmed‑down data is then passed to a Transformer, a family of models best known for powering modern language tools. Instead of reading words in a sentence, the Transformer here “reads” traffic features, using its attention mechanism to learn how different properties of a connection influence each other. The result is a compact, context‑aware fingerprint for each pair of communicating machines, rich enough to capture the behavior of multi‑step APT campaigns.
Creating Realistic Examples of Rare Attacks
The second major hurdle is that there are very few confirmed APT instances compared with mountains of benign traffic. Simply copying these scarce attack records, as done in basic oversampling techniques, risks teaching the model to memorize rather than to generalize. ET‑SDG addresses this with a Conditional Generative Model for Synthesis (CGMS), built on a type of neural network known as a conditional generative adversarial network. This generator learns to create new synthetic data points that statistically resemble known APT behavior, while another network tries to tell real and fake apart. By training them together, the system produces additional, varied examples of attack traffic, but only within the training data, to avoid contaminating evaluation. An attention‑based layer then focuses on the most informative parts of these enriched representations before a final classifier decides whether an IP pair is likely benign or under attack.

Testing on Real and Difficult Datasets
To see whether this design pays off, the authors evaluated ET‑SDG on a combined dataset of real APT malware captures and government network traffic, as well as a large public intrusion‑detection benchmark famed for its severe class imbalance. They compared their system with a range of alternatives, from simpler deep‑learning models that process flows like time series, to graph‑based approaches that emphasize relationships among machines. Across multiple measures — including accuracy, precision, recall, and F1‑score — ET‑SDG consistently matched or outperformed most competitors, often improving results by one to four percentage points. Importantly, it did so while keeping both missed attacks and false alarms low, and its performance remained stable when the data was reshuffled in repeated cross‑validation tests.
What This Means for Everyday Security
For a non‑specialist, the key takeaway is that ET‑SDG offers a more nuanced way to watch network traffic. By first learning which details matter, then interpreting them in context, and finally inventing realistic extra examples of rare attacks, the system becomes better at picking out stealthy APT behavior from everyday digital chatter. While the approach is more computationally demanding than older methods and has so far been tested mainly in offline experiments, it shows that combining advanced pattern‑recognition with careful synthetic data generation can significantly strengthen early warning systems. In practical terms, this could help security teams spot serious intrusions sooner, focus on higher‑quality alerts, and better protect critical services from long‑term compromise.
Citation: Danh, L.T.K., Xuan, C.D. & Van, N.N. Advancing APT detection through transformer-driven feature learning and synthetic data generation. Sci Rep 16, 11772 (2026). https://doi.org/10.1038/s41598-026-41317-5
Keywords: advanced persistent threats, network intrusion detection, transformer models, synthetic data generation, cybersecurity AI