Clear Sky Science · en
Correlation based feature importance analysis for improving machine learning stability predictions in hybrid PV systems
Why keeping the lights steady is getting harder
As more homes and businesses run on solar power, keeping the electricity grid steady becomes trickier. Clouds passing over panels or sudden changes in demand can nudge voltages up and down in ways that traditional control methods were never designed to handle. This paper explores how modern machine learning can act like an early-warning system for such disturbances, predicting both grid voltage and overall stability in hybrid systems that mix solar power with conventional sources.

How a digital twin of a solar-powered grid was built
Instead of relying on noisy or incomplete field measurements, the authors first created a detailed “digital twin” of a grid-connected solar microgrid in MATLAB/Simulink. This virtual system includes solar panels, an electronic inverter that links them to the grid, and customer loads that change with voltage and frequency. By systematically varying sunlight, temperature, load demand, and inverter operating conditions, they generated 500 realistic operating scenarios. For each one, the model records grid current, voltage, and a composite stability score that reflects how well the system rides through disturbances, how tightly voltage and frequency are controlled, and whether the inverter is close to its limits.
Turning raw signals into meaningful clues
From these simulations, six key signals were chosen as inputs for prediction: ambient temperature, solar irradiance, load level, DC-link voltage, inverter output power, and grid current. The team normalized and cleaned the data, removed outliers, and then used a simple but powerful idea to highlight what matters most: correlation-based feature weighting. For each input, they measured how strongly it moves with the target outputs, grid voltage and the stability score. Features with stronger links were given higher weights before training the models. This step does not invent new data, but it nudges the learning process to pay more attention to physically important variables such as grid current and DC-link voltage.

Putting five learning machines to the test
With the weighted dataset in hand, the authors compared five popular machine learning approaches: Random Forests, Extra Trees, Support Vector Regression, CatBoost, and Gradient Boosting. All models were trained and tested under the same conditions, using an 80:20 split of the data and a common set of accuracy measures. These measures included the familiar coefficient of determination (how much of the variation is explained) and several error scores that look at both average mistakes and the spread of those mistakes. The study also went beyond headline numbers, examining how errors were distributed over time and how often predictions stayed within tight tolerance bands.
Why gradient boosting stood out
Gradient Boosting, which builds a sequence of simple models that each correct the errors of the previous ones, consistently delivered the most accurate and reliable predictions. For grid voltage, it matched measured values extremely closely, with a test R² of about 0.98 and typical percentage errors around a quarter of a percent; roughly 95% of voltage errors stayed within half a volt. For the stability score, it again came out on top, capturing over 93% of the variation with mean errors under one unit. When the correlation-based feature weights were applied, its accuracy improved further, especially for stability prediction, and its error distributions became narrower and more uniform than those of competing models. This indicates not just high average performance, but dependable behavior across calm and rapidly changing operating conditions.
What this means for future solar-rich grids
For non-specialists, the key message is that carefully designed machine learning tools can give grid operators a trustworthy preview of how a solar-heavy system will behave in the next moments to minutes. By combining a physics-based digital twin, simple correlation analysis, and a strong ensemble method like Gradient Boosting, the framework can flag when voltages are likely to drift or when the system’s stability margin is shrinking. This, in turn, supports smarter inverter settings, better use of reactive power, and more targeted maintenance, reducing both outages and unnecessary curtailment of clean energy. In essence, the study shows that adding an interpretable layer of data-driven intelligence on top of existing control hardware can make renewable-rich grids more resilient, efficient, and easier to manage.
Citation: Swarnkar, V., Ralhan, S., Singh, M. et al. Correlation based feature importance analysis for improving machine learning stability predictions in hybrid PV systems. Sci Rep 16, 10041 (2026). https://doi.org/10.1038/s41598-026-37270-y
Keywords: hybrid photovoltaic systems, grid voltage prediction, machine learning for power grids, gradient boosting, renewable grid stability