Clear Sky Science · en

Enhancing wind and solar energy forecasting through time-series feature engineering and ensemble machine learning

· Back to index

Why better clean energy forecasts matter

As wind turbines and solar panels provide more of our electricity, their natural ups and downs make it harder to keep the lights on. Grid operators must know not just how much power is being generated right now, but how that output is likely to change over the next few hours. This study explores how advanced data analysis and machine learning can turn years of wind and solar records into sharper short term forecasts that help balance supply and demand, reduce waste, and support a more reliable low carbon grid.

From raw power readings to smarter signals

The researchers worked with nearly six years of hourly wind and solar power data from across France, covering more than fifty thousand time points. Rather than feeding these raw numbers straight into a model, they reshaped them into richer signals. They added information about what the output had been an hour, a day, or even longer ago, computed short term averages and variability, and encoded calendar patterns such as time of day, day of week, and season using circular functions that reflect daily and yearly cycles. They also checked carefully for redundant information and hidden leaks from future data, so the models would be judged on realistic forecasting tasks rather than on accidentally seeing the answers in advance.

Figure 1. Turning past wind and sunlight patterns into clearer short term forecasts for a stable clean energy grid.
Figure 1. Turning past wind and sunlight patterns into clearer short term forecasts for a stable clean energy grid.

Teaching machines to follow the weather’s rhythm

With this engineered time series in hand, the team tested a wide range of forecasting methods. Classical statistical models such as ARIMA, which assume relatively simple linear patterns, were compared with more flexible machine learning systems including gradient boosted decision trees and deep neural networks. Two tree based ensembles, CatBoost and LightGBM, stood out. These methods build many small decision trees that each capture different aspects of the data and then combine them into a single strong predictor. By using a strict forward rolling evaluation scheme, where each new forecast is made only from past data, the authors ensured that performance numbers would resemble what could be expected in real grid operations.

How far ahead can we really see

The study examined forecasts from one hour up to a full day ahead for wind and solar power separately. For wind, the best models captured most of the variation at the one hour horizon, with performance gradually declining as the lead time increased. Up to around six hours ahead, the forecasts still carried useful information, but by twelve to twenty four hours the growing influence of changing weather made predictions much less certain. Solar power proved even harder to anticipate for longer horizons, because cloud cover and other fast moving factors can change quickly in ways that are not visible in past power output alone. The models did a decent job for the next hour or so, especially on clear days, but beyond a few hours their skill dropped sharply.

What the models actually pay attention to

By systematically removing groups of input features, the authors probed which pieces of information mattered most. Recent power levels the lagged values were the single dominant ingredient, confirming that what just happened is usually the best clue to what comes next. Rolling averages and measures of short term variability also played a major role, especially when the system was transitioning between calm and windy or between cloudy and sunny conditions. Calendar and cyclic features, such as hour of day encoded on a circle, became more important at longer horizons, where broad daily and seasonal patterns matter more than minute by minute fluctuations. Deep learning models based on recurrent neural networks could follow complex swings in production, but the best tuned tree ensembles matched or exceeded their accuracy with lower computational cost.

Figure 2. Step by step process where recent renewable power data are transformed into multi hour forecasts using layered models.
Figure 2. Step by step process where recent renewable power data are transformed into multi hour forecasts using layered models.

What this means for the future grid

For a general reader, the key message is that careful preparation of time stamped data and thoughtful model testing can make a real difference in how well we can anticipate the output of wind and solar farms. Sophisticated but practical machine learning methods can provide reliable forecasts for the next few hours, which is the time window most important for day to day grid balancing. At the same time, the study shows that looking a full day ahead remains difficult if we rely only on past production, especially for solar power. To push forecasting further, future tools will need to blend these data driven techniques with detailed weather information and physical knowledge about turbines and panels, helping grids stay stable as clean energy takes on a larger share of the load.

Citation: Elmunim, N.A., Khlifi, M.A., Aldawsari, M.A. et al. Enhancing wind and solar energy forecasting through time-series feature engineering and ensemble machine learning. Sci Rep 16, 15546 (2026). https://doi.org/10.1038/s41598-026-49373-7

Keywords: renewable energy forecasting, wind power prediction, solar power prediction, machine learning, time series features