Clear Sky Science · en
A Graph-based Benchmark dataset for Printed Circuit Netlist Partitioning
Why breaking down circuit blueprints matters
Every electronic gadget you own, from your smartwatch to your smart fridge, relies on printed circuit boards (PCBs)—the green plates packed with tiny components and copper tracks. When engineers or security analysts need to understand how a board works, or check it for hidden weaknesses and tampering, they first have to turn a low-level wiring description, called a netlist, into meaningful functional blocks such as power supplies, filters, or communication units. This paper introduces BenchPCNP, the first large, openly available benchmark dataset that helps researchers train and compare modern AI methods to automatically split complex PCB netlists into such understandable modules.

From messy wiring lists to meaningful building blocks
A PCB netlist is essentially a long machine-readable list describing which component pins are connected together. On its own, it is like having every street intersection in a city written down, but no city map. Netlist partitioning tackles this problem by grouping components into functional sub-circuits—turning that unwieldy list into recognizable units like “power input,” “signal filter,” or “LED driver.” This is crucial in electronic design automation, where engineers want tools that can reorganize, debug, or redesign circuits quickly, and in reverse engineering, where teams must analyze third-party boards for security checks and intellectual property protection.
Why new data is needed for smarter tools
Recent advances in artificial intelligence, especially in graph neural networks that are designed to reason over interconnected data, promise big improvements in how we automatically understand circuits. However, progress has been slowed by a lack of high-quality, shared datasets with reliable ground-truth labels. Earlier studies used small or private collections of PCB netlists, making it hard to compare methods or reproduce results. BenchPCNP directly addresses this gap by collecting 50 real, production-verified PCB designs and carefully labeling how each circuit is divided into functional modules, creating a common testbed for the community.
How the dataset was built from real-world designs
The authors obtained historical PCB projects from a professional design house, ensuring that the circuits are realistic and have already worked in practice. For each design, they took the master netlist file in a standard Protel 2 format and its associated sub-netlists that reflect the designer’s intended module breakdown, based on an industry guideline known as IPC-2612. To protect commercial secrets, schematic drawings were removed, but the netlists—containing component names and electrical connections—were preserved. Five experienced designers cross-checked all files, then divided each full circuit into functional modules, such as power supply, filtering, LED control, and connectors, creating 54 distinct module categories that mirror how engineers reuse building blocks across many products.
Turning circuits into graphs for machine learning
To make the data usable for AI models, the team translated each netlist into a graph, where nodes represent components and links represent electrical connections. They distilled each component down to three practical attributes—what kind of part it is (for example, resistor or chip), its package style, and how many pins it effectively has—and encoded these as simple numeric features. Two complementary views of connectivity were created. In the traditional graph view, components are connected pairwise. In the hypergraph view, a single “hyperedge” can link many components that share the same electrical net, capturing multi-way relationships more directly and reducing redundant links. Both views are provided in standardized JSON files, along with scripts so other researchers can rebuild or extend the data.

What the benchmark reveals about smarter models
Using BenchPCNP, the authors tested several popular graph neural networks and their hypergraph-based counterparts on the task of assigning each component to its correct module. They found that models using the hypergraph view consistently achieved higher accuracy and F1 scores than those using only traditional pairwise graphs, especially in the presence of large shared networks like power and ground. The dataset also shows realistic imbalances: common modules, like power and filters, contain many more components than rare ones, such as special serial links or current limiters. Experiments where the amount of training data was gradually increased showed steady gains in performance, indicating that BenchPCNP scales well as a testbed for future, larger studies.
What this means for future electronics and security
In everyday terms, BenchPCNP gives researchers a shared, trustworthy “map collection” of real circuit cities, along with clear neighborhood labels, so that new AI tools can learn to read and reconstruct PCBs more reliably. By proving that hypergraph-based models better capture the true structure of circuit connections, the work points developers toward more accurate and efficient methods for automatic module discovery. In the long run, this benchmark should speed up the development of smarter design aids and more powerful reverse-engineering tools, helping engineers build safer electronics and detect hidden problems in the hardware that underpins modern life.
Citation: Yang, J., Qiao, K., Chen, J. et al. A Graph-based Benchmark dataset for Printed Circuit Netlist Partitioning. Sci Data 13, 522 (2026). https://doi.org/10.1038/s41597-026-06818-y
Keywords: printed circuit boards, graph neural networks, netlist partitioning, hypergraph modeling, hardware security