Clear Sky Science · en

A stacking ensemble with Pareto optimization for scalable electricity theft detection via hybrid data repair and lightweight deployment

· Back to index

Why stolen electricity matters to everyone

Electricity theft may sound like a distant problem, but it quietly raises power bills, strains the grid, and increases the chances of blackouts. Around the world, people tap lines illegally or tamper with meters, costing utilities billions of dollars each year. This study introduces a new way to spot such theft automatically in the vast streams of data coming from smart meters, aiming to protect both grid stability and honest customers’ wallets.

How smart meters can both help and mislead

Modern smart meters record how much electricity homes and businesses use every day, creating a detailed picture of demand over time. In principle, unusual patterns in these records can reveal theft, such as sudden drops in reported usage or oddly irregular bursts. In practice, however, the data are messy: readings go missing, some are corrupted, and genuine customers greatly outnumber thieves. Simple rules or older software either miss too many theft cases or trigger too many false alarms, making them hard to trust in real-world operations.

Figure 1
Figure 1.

Cleaning up flawed data before making judgments

The researchers designed a full pipeline, called STL-Net, that treats data quality as seriously as the final prediction. First, it repairs missing readings through a hybrid process that combines several techniques, choosing different methods depending on how incomplete each part of the data is. Next, it tackles the fact that theft cases are rare by carefully rebalancing the data so the learning algorithms see enough examples of suspicious behavior without overfitting. Finally, it compresses long histories of daily usage into a smaller set of summary features that still preserve key patterns, making the problem faster to solve while remaining understandable.

Stacking several smart models instead of one big black box

At the heart of STL-Net is an approach known as stacking: instead of trusting a single prediction model, the system trains several different ones and then learns how best to combine their outputs. Here, four advanced tree-based models each estimate the chance that a customer is stealing electricity. A fifth model then learns how to weigh and fuse these individual opinions into a final decision. To avoid building an overcomplicated system, the authors use a genetic search strategy that looks for model settings that balance two goals at once: high accuracy and low computational cost. This “Pareto” optimization yields configurations that are good enough on both fronts, rather than extreme in only one.

Figure 2
Figure 2.

Fast enough for the field, and open to inspection

On a large real dataset from the State Grid Corporation of China, covering more than a thousand days of usage for over forty thousand customers, STL-Net caught theft with very high reliability. It outperformed a wide range of standard machine‑learning methods and deep neural networks, achieving both strong scores for correctly identifying thieves and low rates of mislabeling honest users. The team also built a lighter version, STL-Lite, which removes the slowest component to cut response time by about 40%, making it more practical for devices with limited computing power while preserving nearly the same detection quality.

Seeing why the system flags a customer

Beyond raw accuracy, utilities and regulators need to understand why a system accuses a customer of theft. STL-Net addresses this by using an explanation technique that assigns each decision to the most influential features, such as recent changes in consumption over specific time windows. These explanations reveal that the model focuses on sustained, suspicious shifts in recent usage, rather than on isolated spikes, and they allow operators to inspect borderline cases more carefully. This transparency helps turn the model from a mysterious black box into a decision aid that can be audited and trusted.

What this means for future power bills and reliability

In simple terms, the study shows that it is possible to build an electricity theft detector that is accurate, efficient, and explainable all at once. By carefully repairing data, balancing rare theft cases, combining several complementary models, and keeping an eye on computing speed, STL-Net offers a practical tool for utilities. If adopted and tailored to local conditions, such systems could reduce hidden losses, support fairer billing, and help keep the grid more stable for everyone who depends on it.

Citation: Rahaman, M.A., Mohamad Idris, R. A stacking ensemble with Pareto optimization for scalable electricity theft detection via hybrid data repair and lightweight deployment. Sci Rep 16, 14548 (2026). https://doi.org/10.1038/s41598-026-39693-z

Keywords: electricity theft, smart meters, machine learning, ensemble models, smart grid security