🤖 AI Summary
Missing values in time series pose a fundamental challenge, where conventional two-stage imputation-then-modeling paradigms suffer from error propagation and inherent trade-offs, while existing end-to-end approaches often compromise scalability or fail to fully exploit partially observed features.
Method: We propose MissTSM, an end-to-end, imputation-free modeling framework. It introduces joint timestep-feature tokenization and a Missing-Feature-Aware Attention (MFAA) layer to explicitly capture dynamic dependencies among partially observable variables. Integrated with missing-pattern encoding, differentiable attention masking, and a Transformer backbone, MissTSM enables fully differentiable training.
Contribution/Results: Evaluated across multiple benchmark datasets, MissTSM consistently outperforms state-of-the-art imputation-based and single-stage methods. It achieves superior prediction accuracy and robustness under varying missingness patterns, establishing new SOTA performance in both metrics.
📝 Abstract
A significant challenge in time-series (TS) modeling is the presence of missing values in real-world TS datasets. Traditional two-stage frameworks, involving imputation followed by modeling, suffer from two key drawbacks: (1) the propagation of imputation errors into subsequent TS modeling, (2) the trade-offs between imputation efficacy and imputation complexity. While one-stage approaches attempt to address these limitations, they often struggle with scalability or fully leveraging partially observed features. To this end, we propose a novel imputation-free approach for handling missing values in time series termed Missing Feature-aware Time Series Modeling (MissTSM) with two main innovations. First, we develop a novel embedding scheme that treats every combination of time-step and feature (or channel) as a distinct token. Second, we introduce a novel Missing Feature-Aware Attention (MFAA) Layer to learn latent representations at every time-step based on partially observed features. We evaluate the effectiveness of MissTSM in handling missing values over multiple benchmark datasets.