🤖 AI Summary
To address the challenge of complex, realistic missingness patterns in infrastructure monitoring—distinct from synthetically generated masks—this paper proposes a dynamic preprocessing framework that jointly models real-world missingness distributions via missing-pattern clustering and adaptive masking strategies, augmented with theoretical learning guarantees to bridge the simulation-to-reality gap. The method integrates lightweight modeling with an efficient dynamic masking mechanism, ensuring both training robustness and inference efficiency. Evaluated on over 2 billion real-world water, electricity, and gas measurement records, the framework achieves comparable accuracy to conventional methods using significantly less training data and time. Compared to state-of-the-art large models, it improves average imputation accuracy by 2× while substantially accelerating inference speed.
📝 Abstract
Time series imputation models have traditionally been developed using complete datasets with artificial masking patterns to simulate missing values. However, in real-world infrastructure monitoring, practitioners often encounter datasets where large amounts of data are missing and follow complex, heterogeneous patterns. We introduce DIM-SUM, a preprocessing framework for training robust imputation models that bridges the gap between artificially masked training data and real missing patterns. DIM-SUM combines pattern clustering and adaptive masking strategies with theoretical learning guarantees to handle diverse missing patterns actually observed in the data. Through extensive experiments on over 2 billion readings from California water districts, electricity datasets, and benchmarks, we demonstrate that DIM-SUM outperforms traditional methods by reaching similar accuracy with lower processing time and significantly less training data. When compared against a large pre-trained model, DIM-SUM averages 2x higher accuracy with significantly less inference time.