Clear Sky Science · en
Deployable knowledge–data hybrid models for day-ahead cooling load prediction under data scarcity: a case study and performance validation
Why predicting building cooling matters
As cities grow and heat waves become more frequent, large office buildings rely heavily on air conditioning. Knowing tomorrow’s cooling demand in advance helps building operators buy electricity wisely, run chillers efficiently, and support a power grid that is adding more solar and wind. Yet the newest and most efficient buildings often have very little data from their first months of operation, which makes it hard for standard artificial intelligence tools to forecast their cooling needs reliably.

The challenge of limited data
Many current prediction tools treat a building as a black box. They feed in past energy use, weather, and schedules, and let a learning algorithm search for patterns. These data hungry models can work well when years of high quality records are available. But in a new or newly renovated building, only a short history exists. Under these conditions, purely data based models tend to latch onto quirks in the limited data, miss sudden jumps in demand, and give forecasts that swing widely from day to day. This is especially problematic for day ahead planning, when operators must schedule cooling equipment and interact with the electricity market a full 24 hours in advance.
Blending simple physics with modern learning
The study introduces a practical middle road between detailed physics simulations and pure data mining. Instead of trying to model every source of heat in the building, the authors focus on two contributions that can be computed from information most buildings already have: the heat that comes in with outside fresh air and the heat that leaks through the walls and windows. Using basic heat transfer formulas, they turn weather forecasts, glass properties, and ventilation schedules into rough, physically sensible estimates of these loads. These estimates do not replace measured cooling demand but are added as extra inputs that guide a deep learning model combining convolutional and recurrent network layers.
Testing the hybrid idea in a real office tower
The approach was tested on a 23 story office building in Hangzhou, China, with a modern glass facade and central cooling system. The researchers used one cooling season of hourly data, about 4,300 hours in total, then artificially restricted how much of this history each model could see during training. In some tests, the models could learn from only 10 percent of the data, equivalent to just over two weeks of records. Across four versions of the predictor, three used physics based fresh air or wall and window loads as guiding signals, while a fourth relied solely on past cooling and weather data. All models attempted to forecast the next day’s cooling profile hour by hour.

More accurate and steadier predictions
When training data were scarce, the differences between the approaches were stark. With only 10 percent of the data available, the purely data driven model often missed the sharp morning ramp up in cooling as workers arrived and underestimated hot afternoon peaks. Its errors varied widely depending on which days were used for training. In contrast, all three hybrid versions tracked the timing and height of peaks much more closely and showed far less scatter in their errors. On average, the hybrid models cut the typical prediction error by about half and reduced the spread of errors by nearly an order of magnitude compared with the baseline. The simplest variant, which used only the fresh air load as extra information, offered an especially attractive balance of accuracy, stability, and ease of setup.
What this means for real buildings
For building owners and energy managers, the main message is that a little physics goes a long way. By folding simple, easy to compute estimates of how fresh air and the building envelope add to cooling needs into a learning model, it becomes possible to get useful day ahead forecasts even during the early months of operation, when historical data are still thin. The study shows that this knowledge data hybrid approach can tame overfitting, keep training costs modest, and provide reliable guidance for scheduling chillers and storage. In plain terms, combining basic engineering insight with modern data tools helps buildings stay comfortable, cut waste, and cooperate more smoothly with a changing power grid.
Citation: Chen, J., Sun, T., Zhang, Y. et al. Deployable knowledge–data hybrid models for day-ahead cooling load prediction under data scarcity: a case study and performance validation. Sci Rep 16, 15079 (2026). https://doi.org/10.1038/s41598-026-45325-3
Keywords: cooling load prediction, building energy, hybrid modeling, data scarcity, deep learning