๐ค AI Summary
Household electricity time-series data in the energy domain suffer from scarcity, high heterogeneity, and insufficient coverage of rare scenarios (e.g., specific geographic regions, building types, or PV configurations).
Method: We propose a context-generalizable, high-fidelity time-series generation framework. It introduces a novel context-normalized inverse transformation mechanism, a scalable context encoder, and a jointly trained auxiliary context classification lossโenabling flexible modeling of arbitrary numbers and combinations of contextual variables. The framework integrates time-series generative models (GANs/diffusion), normalizing flows, and multi-task optimization.
Results: Experiments demonstrate that the generated data significantly outperform baselines in realism, diversity, and fidelity to rare scenarios across multiple quantitative metrics. The synthetic data effectively enhance the robust training of energy foundation models on hybrid datasets comprising both synthetic and real samples.
๐ Abstract
Recent breakthroughs in large-scale generative modeling have demonstrated the potential of foundation models in domains such as natural language, computer vision, and protein structure prediction. However, their application in the energy and smart grid sector remains limited due to the scarcity and heterogeneity of high-quality data. In this work, we propose a method for creating high-fidelity electricity consumption time series data for rare and unseen context variables (e.g. location, building type, photovoltaics). Our approach, Context Encoding and Normalizing Time Series Generation, or CENTS, includes three key innovations: (i) A context normalization approach that enables inverse transformation for time series context variables unseen during training, (ii) a novel context encoder to condition any state-of-the-art time-series generator on arbitrary numbers and combinations of context variables, (iii) a framework for training this context encoder jointly with a time-series generator using an auxiliary context classification loss designed to increase expressivity of context embeddings and improve model performance. We further provide a comprehensive overview of different evaluation metrics for generative time series models. Our results highlight the efficacy of the proposed method in generating realistic household-level electricity consumption data, paving the way for training larger foundation models in the energy domain on synthetic as well as real-world data.