🤖 AI Summary
This work explores the potential of non-contrastive learning for pretraining foundation models on time series data to enhance downstream classification performance. We introduce, for the first time, a DINOv2-style self-distillation mechanism into time series modeling, integrating the Mantis tokenizer with a Transformer encoder within a student-teacher framework. This approach jointly optimizes temporal invariance and local fine-grained structure through temporal cropping augmentation and block-wise masked reconstruction, enabling multi-objective pretraining. Extensive experiments on the UCR and UEA benchmarks demonstrate that our method achieves state-of-the-art performance, validating the effectiveness and superiority of non-contrastive self-distillation for time series representation learning.
📝 Abstract
Self-supervised foundation models have achieved remarkable success across domains, including time series. However, the potential of non-contrastive methods, a paradigm that has driven significant advances in computer vision, remains underexplored for time series. In this work, we adapt DINOv2-style self-distillation to pretrain a time series foundation model, building on the Mantis tokenizer and transformer encoder architecture as our backbone. Through a student-teacher framework, our method Utica learns representations that capture both temporal invariance via augmented crops and fine-grained local structure via patch masking. Our approach achieves state-of-the-art classification performance on both UCR and UEA benchmarks. These results suggest that non-contrastive methods are a promising and complementary pretraining strategy for time series foundation models.