IISE PG&E Energy Analytics Challenge 2025: Hourly-Binned Regression Models Beat Transformers in Load Forecasting

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses long-term hourly electricity load forecasting under realistic constraints: limited training data and availability of multiple exogenous covariates (e.g., temperature, irradiance). We propose a *hourly binning regression* paradigm: decomposing the 24-hour cycle into 24 independent regression subproblems; integrating external covariates, applying PCA for dimensionality reduction, and training—then stacking—the outputs of 24 separate XGBoost models to yield annual-scale forecasts. Crucially, this approach abandons end-to-end temporal modeling and empirically reveals that, under small-sample and multi-source-covariate conditions, conventional tree-based models substantially outperform Transformer-based architectures (e.g., TimeGPT). It establishes a “data-characteristic-driven” model selection principle, superseding “architecture-complexity-driven” heuristics. Experiments across five real-world sites demonstrate that XGBoost achieves the lowest MAE, highest training/inference efficiency, and strong practical viability—validating the effectiveness of lightweight models in long-term load forecasting.

Technology Category

Application Category

📝 Abstract
Accurate electricity load forecasting is essential for grid stability, resource optimization, and renewable energy integration. While transformer-based deep learning models like TimeGPT have gained traction in time-series forecasting, their effectiveness in long-term electricity load prediction remains uncertain. This study evaluates forecasting models ranging from classical regression techniques to advanced deep learning architectures using data from the ESD 2025 competition. The dataset includes two years of historical electricity load data, alongside temperature and global horizontal irradiance (GHI) across five sites, with a one-day-ahead forecasting horizon. Since actual test set load values remain undisclosed, leveraging predicted values would accumulate errors, making this a long-term forecasting challenge. We employ (i) Principal Component Analysis (PCA) for dimensionality reduction and (ii) frame the task as a regression problem, using temperature and GHI as covariates to predict load for each hour, (iii) ultimately stacking 24 models to generate yearly forecasts. Our results reveal that deep learning models, including TimeGPT, fail to consistently outperform simpler statistical and machine learning approaches due to the limited availability of training data and exogenous variables. In contrast, XGBoost, with minimal feature engineering, delivers the lowest error rates across all test cases while maintaining computational efficiency. This highlights the limitations of deep learning in long-term electricity forecasting and reinforces the importance of model selection based on dataset characteristics rather than complexity. Our study provides insights into practical forecasting applications and contributes to the ongoing discussion on the trade-offs between traditional and modern forecasting methods.
Problem

Research questions and friction points this paper is trying to address.

Evaluating model performance for long-term electricity load forecasting
Comparing transformer-based and classical regression forecasting techniques
Assessing impact of limited training data on deep learning models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses PCA for dimensionality reduction
Frames task as hourly regression problem
Stacks 24 models for yearly forecasts
🔎 Similar Papers
No similar papers found.
Millend Roy
Millend Roy
Columbia University
Energy MarketsMicrogridsDecision MakingPredictive Analytics
V
Vladimir Pyltsov
Columbia University, New York, NY, USA
Y
Yinbo Hu
Columbia University, New York, NY, USA