DIM-SUM: Dynamic IMputation for Smart Utility Management

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of complex, realistic missingness patterns in infrastructure monitoring—distinct from synthetically generated masks—this paper proposes a dynamic preprocessing framework that jointly models real-world missingness distributions via missing-pattern clustering and adaptive masking strategies, augmented with theoretical learning guarantees to bridge the simulation-to-reality gap. The method integrates lightweight modeling with an efficient dynamic masking mechanism, ensuring both training robustness and inference efficiency. Evaluated on over 2 billion real-world water, electricity, and gas measurement records, the framework achieves comparable accuracy to conventional methods using significantly less training data and time. Compared to state-of-the-art large models, it improves average imputation accuracy by 2× while substantially accelerating inference speed.

Technology Category

Application Category

📝 Abstract
Time series imputation models have traditionally been developed using complete datasets with artificial masking patterns to simulate missing values. However, in real-world infrastructure monitoring, practitioners often encounter datasets where large amounts of data are missing and follow complex, heterogeneous patterns. We introduce DIM-SUM, a preprocessing framework for training robust imputation models that bridges the gap between artificially masked training data and real missing patterns. DIM-SUM combines pattern clustering and adaptive masking strategies with theoretical learning guarantees to handle diverse missing patterns actually observed in the data. Through extensive experiments on over 2 billion readings from California water districts, electricity datasets, and benchmarks, we demonstrate that DIM-SUM outperforms traditional methods by reaching similar accuracy with lower processing time and significantly less training data. When compared against a large pre-trained model, DIM-SUM averages 2x higher accuracy with significantly less inference time.
Problem

Research questions and friction points this paper is trying to address.

Handling real-world missing data with complex patterns
Bridging gap between artificial and real missing data
Improving imputation accuracy with less training data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic imputation framework for real missing patterns
Combines pattern clustering and adaptive masking
Achieves higher accuracy with less training data
🔎 Similar Papers
No similar papers found.