DIM-SUM: Dynamic IMputation for Smart Utility Management

📅 2025-06-24

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

To address the challenge of complex, realistic missingness patterns in infrastructure monitoring—distinct from synthetically generated masks—this paper proposes a dynamic preprocessing framework that jointly models real-world missingness distributions via missing-pattern clustering and adaptive masking strategies, augmented with theoretical learning guarantees to bridge the simulation-to-reality gap. The method integrates lightweight modeling with an efficient dynamic masking mechanism, ensuring both training robustness and inference efficiency. Evaluated on over 2 billion real-world water, electricity, and gas measurement records, the framework achieves comparable accuracy to conventional methods using significantly less training data and time. Compared to state-of-the-art large models, it improves average imputation accuracy by 2× while substantially accelerating inference speed.

Technology Category

Application Category

📝 Abstract

Time series imputation models have traditionally been developed using complete datasets with artificial masking patterns to simulate missing values. However, in real-world infrastructure monitoring, practitioners often encounter datasets where large amounts of data are missing and follow complex, heterogeneous patterns. We introduce DIM-SUM, a preprocessing framework for training robust imputation models that bridges the gap between artificially masked training data and real missing patterns. DIM-SUM combines pattern clustering and adaptive masking strategies with theoretical learning guarantees to handle diverse missing patterns actually observed in the data. Through extensive experiments on over 2 billion readings from California water districts, electricity datasets, and benchmarks, we demonstrate that DIM-SUM outperforms traditional methods by reaching similar accuracy with lower processing time and significantly less training data. When compared against a large pre-trained model, DIM-SUM averages 2x higher accuracy with significantly less inference time.

Problem

Research questions and friction points this paper is trying to address.

Handling real-world missing data with complex patterns

Bridging gap between artificial and real missing data

Improving imputation accuracy with less training data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic imputation framework for real missing patterns

Combines pattern clustering and adaptive masking

Achieves higher accuracy with less training data

🔎 Similar Papers

No similar papers found.