Clustering-based aggregate value regression

📅 2025-08-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses aggregate-value prediction—a critical yet underexplored statistical learning problem, exemplified by forecasting total regional electricity demand. We propose Aggregate-Value Regression (AVR), the first statistical learning framework explicitly designed for aggregate targets, and develop AVR-C, an algorithm integrating AVR with hierarchical clustering, where the number of clusters serves as a complexity-control parameter to mitigate overparameterization in high dimensions. Theoretically, we establish a novel bias–variance trade-off analysis under model misspecification, uniquely characterizing the evolution of aggregate prediction error via the cluster count. Methodologically, AVR-C unifies intra-cluster aggregate regression, hierarchical clustering, and Monte Carlo simulation to enable efficient estimation and rigorous uncertainty quantification. Empirical results demonstrate that AVR-C substantially improves aggregation stability in electricity demand forecasting while achieving precise bias–variance balance.

Technology Category

Application Category

📝 Abstract
In various practical situations, forecasting of aggregate values rather than individual ones is often our main focus. For instance, electricity companies are interested in forecasting the total electricity demand in a specific region to ensure reliable grid operation and resource allocation. However, to our knowledge, statistical learning specifically for forecasting aggregate values has not yet been well-established. In particular, the relationship between forecast error and the number of clusters has not been well studied, as clustering is usually treated as unsupervised learning. This study introduces a novel forecasting method specifically focused on the aggregate values in the linear regression model. We call it the Aggregate Value Regression (AVR), and it is constructed by combining all regression models into a single model. With the AVR, we must estimate a huge number of parameters when the number of regression models to be combined is large, resulting in overparameterization. To address the overparameterization issue, we introduce a hierarchical clustering technique, referred to as AVR-C (C stands for clustering). In this approach, several clusters of regression models are constructed, and the AVR is performed within each cluster. The AVR-C introduces a novel bias-variance trade-off theory under the assumption of a misspecified model. In this framework, the number of clusters characterizes model complexity. Monte Carlo simulation is conducted to investigate the behavior of training and test errors of our proposed clustering technique. The bias-variance trade-off theory is also demonstrated through the analysis of electricity demand forecasting.
Problem

Research questions and friction points this paper is trying to address.

Forecasting aggregate values lacks established statistical learning methods
Clustering-based regression addresses overparameterization in aggregate forecasting
Bias-variance trade-off theory under misspecified models requires investigation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical clustering technique for regression
Aggregate Value Regression combining multiple models
Bias-variance trade-off theory for misspecified models
🔎 Similar Papers
No similar papers found.
Kei Hirose
Kei Hirose
九州大学
H
Hidetoshi Matsui
Faculty of Data Science, Shiga University, 1-1-1, Banba, Hikone, Shiga, 522-8522, Japan
Hiroki Masuda
Hiroki Masuda
University of Tokyo
Statistics for stochastic processes and their applications