Clear Sky Science · en

A dataset collected in real-world industrial control systems for network attack detection

· Back to index

Why hidden attacks on factory networks matter to you

Electricity, clean water, and manufactured goods all depend on unseen computers that quietly steer pumps, turbines, and valves. As these industrial control systems connect to wider networks to become "smart" and efficient, they also inherit the same cyber risks as office PCs and home routers. This paper introduces ICS-NAD, a large, realistic collection of network data from real industrial sites that is designed to help researchers spot and stop cyberattacks before they disrupt daily life.

Figure 1
Figure 1.

Modern factories are no longer sealed off

Industrial control systems used to be physically isolated, with little or no link to the internet. In the push toward Industry 4.0, companies now connect these systems so they can monitor equipment remotely, analyze performance, and apply artificial intelligence. The flip side is that attackers can also reach in through these digital pathways. Around the world, serious incidents have already hit power, water, and other critical services, showing that the stakes are high. Detecting intrusions early requires good training data for security tools, but the few existing datasets are often small, artificial, or lack the right kinds of attacks and labels.

Building a more lifelike picture of industrial networks

The authors address these gaps by creating ICS-NAD, a benchmark dataset recorded from a large test site that mirrors real industry. The site includes ten brands of industrial controllers and ten different process setups; for the dataset they focus on three well-known brands used in a thermal power plant mockup and a sewage treatment mockup. Each brand uses a different, widely deployed industrial protocol that transmits messages without encryption, allowing the researchers to observe fine-grained details of how devices talk to one another. Network traffic is captured directly from switches as human–machine interfaces send commands to programmable logic controllers, which in turn drive pumps, heaters, and other equipment.

Capturing many ways to break a system

To reflect the variety of real threats, ICS-NAD includes 20 common attack types grouped into four families. Reconnaissance attacks quietly scan for active devices and open ports. Denial-of-service and distributed denial-of-service attacks flood the network with packets, aiming to overwhelm devices so that legitimate commands are delayed or dropped. False-data-injection attacks forge messages and responses to mislead controllers or operators, while man-in-the-middle attacks sit between devices, altering traffic in transit. For each scenario, the researchers record not only the raw packets but also when each attack starts and stops, and then apply a two-step labeling method that combines these time logs with attack-specific rules. This produces clear labels indicating whether each observed flow is harmless or belongs to a particular attack.

Figure 2
Figure 2.

Seeing traffic patterns before and during an attack

Beyond simply logging packets, the team extracts 60 descriptive features from the traffic, such as how many packets move in each direction, how large they are, and how quickly they arrive. These features cover both coarse trends over time and fine details inside individual packets. By examining traffic from one of the control systems, they show how an intensive flood attack changes the rhythm of communication: bursts of packets become sharper, peaks higher, and idle gaps shorter, all of which can be captured by statistical measures. This richer view helps algorithms distinguish natural fluctuations in industrial activity from suspicious surges caused by an intruder.

Putting the dataset to the test with learning machines

To demonstrate that ICS-NAD is practical, the authors use it to train and evaluate ten different machine-learning and deep-learning methods, ranging from classic decision trees and nearest-neighbor schemes to modern boosted trees and neural networks. After basic cleaning and scaling, they automatically select a small set of the most informative features, largely related to the size and content of traffic flows. Even with only four features per model, most methods reach high scores in identifying attacks across all four attack families, often above 90 percent for accuracy, recall, precision, and F1-score. This suggests that ICS-NAD contains enough variety and realism for researchers to build and compare advanced detection tools.

What this means for safer infrastructure

In plain terms, ICS-NAD is like a detailed flight recorder for factory networks: it captures how real industrial systems behave under normal conditions and under many different kinds of cyber fire. Because it is large, diverse, and openly available, it gives security researchers, engineers, and students a shared testing ground to develop better alarms for critical infrastructure. As utilities and factories continue to connect more of their equipment, datasets like ICS-NAD will be vital for turning raw network chatter into early warning systems that help keep lights on, taps running, and production lines moving.

Citation: Zhou, X., Cheng, Z., Wang, C. et al. A dataset collected in real-world industrial control systems for network attack detection. Sci Data 13, 399 (2026). https://doi.org/10.1038/s41597-026-06738-x

Keywords: industrial control systems, cyberattack detection, network intrusion dataset, critical infrastructure security, machine learning security