🤖 AI Summary
Existing time-series model evaluation focuses on downstream tasks—e.g., forecasting, imputation, anomaly detection, and classification—but lacks rigorous assessment of how well models capture the underlying data-generating distribution.
Method: We propose lossless compression as a novel, theoretically grounded evaluation paradigm, leveraging Shannon’s source coding theorem to equate optimal compression length with negative log-likelihood, thereby establishing a unified information-theoretic benchmark. We introduce TSCom-Bench—a standardized protocol and open-source framework—that enables rapid adaptation of state-of-the-art models (e.g., TimeXer, iTransformer, PatchTST) as compression backbones.
Contribution/Results: Experiments across diverse time-series datasets demonstrate that our approach effectively uncovers latent deficiencies in distribution modeling by leading models, significantly enhancing evaluation rigor and depth. TSCom-Bench constitutes the first general-purpose, full-distribution-oriented benchmark for time-series generative modeling.
📝 Abstract
The evaluation of time series models has traditionally focused on four canonical tasks: forecasting, imputation, anomaly detection, and classification. While these tasks have driven significant progress, they primarily assess task-specific performance and do not rigorously measure whether a model captures the full generative distribution of the data. We introduce lossless compression as a new paradigm for evaluating time series models, grounded in Shannon's source coding theorem. This perspective establishes a direct equivalence between optimal compression length and the negative log-likelihood, providing a strict and unified information-theoretic criterion for modeling capacity. Then We define a standardized evaluation protocol and metrics. We further propose and open-source a comprehensive evaluation framework TSCom-Bench, which enables the rapid adaptation of time series models as backbones for lossless compression. Experiments across diverse datasets on state-of-the-art models, including TimeXer, iTransformer, and PatchTST, demonstrate that compression reveals distributional weaknesses overlooked by classic benchmarks. These findings position lossless compression as a principled task that complements and extends existing evaluation for time series modeling.