Clustering-based aggregate value regression

📅 2025-08-21

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This paper addresses aggregate-value prediction—a critical yet underexplored statistical learning problem, exemplified by forecasting total regional electricity demand. We propose Aggregate-Value Regression (AVR), the first statistical learning framework explicitly designed for aggregate targets, and develop AVR-C, an algorithm integrating AVR with hierarchical clustering, where the number of clusters serves as a complexity-control parameter to mitigate overparameterization in high dimensions. Theoretically, we establish a novel bias–variance trade-off analysis under model misspecification, uniquely characterizing the evolution of aggregate prediction error via the cluster count. Methodologically, AVR-C unifies intra-cluster aggregate regression, hierarchical clustering, and Monte Carlo simulation to enable efficient estimation and rigorous uncertainty quantification. Empirical results demonstrate that AVR-C substantially improves aggregation stability in electricity demand forecasting while achieving precise bias–variance balance.

Technology Category

Application Category

📝 Abstract

In various practical situations, forecasting of aggregate values rather than individual ones is often our main focus. For instance, electricity companies are interested in forecasting the total electricity demand in a specific region to ensure reliable grid operation and resource allocation. However, to our knowledge, statistical learning specifically for forecasting aggregate values has not yet been well-established. In particular, the relationship between forecast error and the number of clusters has not been well studied, as clustering is usually treated as unsupervised learning. This study introduces a novel forecasting method specifically focused on the aggregate values in the linear regression model. We call it the Aggregate Value Regression (AVR), and it is constructed by combining all regression models into a single model. With the AVR, we must estimate a huge number of parameters when the number of regression models to be combined is large, resulting in overparameterization. To address the overparameterization issue, we introduce a hierarchical clustering technique, referred to as AVR-C (C stands for clustering). In this approach, several clusters of regression models are constructed, and the AVR is performed within each cluster. The AVR-C introduces a novel bias-variance trade-off theory under the assumption of a misspecified model. In this framework, the number of clusters characterizes model complexity. Monte Carlo simulation is conducted to investigate the behavior of training and test errors of our proposed clustering technique. The bias-variance trade-off theory is also demonstrated through the analysis of electricity demand forecasting.

Problem

Research questions and friction points this paper is trying to address.

Forecasting aggregate values lacks established statistical learning methods

Clustering-based regression addresses overparameterization in aggregate forecasting

Bias-variance trade-off theory under misspecified models requires investigation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical clustering technique for regression

Aggregate Value Regression combining multiple models

Bias-variance trade-off theory for misspecified models

🔎 Similar Papers

Multiparameter regularization and aggregation in the context of polynomial functional regression