🤖 AI Summary
Power load data exhibit extreme sparsity (62% missingness), severely impeding accurate forecasting. Method: This paper proposes a load forecasting framework integrating Gaussian process interpolation with the weak stationarity assumption for time series. First, Gaussian process regression probabilistically imputes missing values, preserving temporal uncertainty. Second, under the weak stationarity assumption, statistical features are systematically constructed and evaluated across multiple machine learning and deep learning models, with LSTM specifically employed to capture long-term temporal dependencies. Contribution/Results: Gaussian interpolation significantly enhances data quality and model robustness. LSTM achieves the highest prediction accuracy among all benchmarked models—reducing MAE by 18.7% and RMSE by 15.3%—demonstrating the framework’s effectiveness and practicality in extremely sparse scenarios. This work establishes a generalizable technical pathway for modeling sparse energy time-series data.
📝 Abstract
Sparsity, defined as the presence of missing or zero values in a dataset, often poses a major challenge while operating on real-life datasets. Sparsity in features or target data of the training dataset can be handled using various interpolation methods, such as linear or polynomial interpolation, spline, moving average, or can be simply imputed. Interpolation methods usually perform well with Strict Sense Stationary (SSS) data. In this study, we show that an approximately 62% sparse dataset with hourly load data of a power plant can be utilized for load forecasting assuming the data is Wide Sense Stationary (WSS), if augmented with Gaussian interpolation. More specifically, we perform statistical analysis on the data, and train multiple machine learning and deep learning models on the dataset. By comparing the performance of these models, we empirically demonstrate that Gaussian interpolation is a suitable option for dealing with load forecasting problems. Additionally, we demonstrate that Long Short-term Memory (LSTM)-based neural network model offers the best performance among a diverse set of classical and neural network-based models.