🤖 AI Summary
Addressing the critical shortage of foundational models for time-series classification, this paper introduces Mantis—the first lightweight, calibration-oriented foundation model specifically designed for this task. Built upon the Vision Transformer architecture, Mantis employs contrastive learning for self-supervised pretraining and incorporates a novel multivariate adapter that explicitly models inter-channel dependencies while substantially reducing memory overhead. Its core innovation is a classification-aware prediction calibration mechanism, ensuring optimal calibration performance. Under both frozen-feature and fine-tuning paradigms, Mantis consistently outperforms existing general-purpose time-series foundation models, achieving state-of-the-art accuracy across multiple standard classification benchmarks. Crucially, it maintains efficient inference throughput and low-resource deployment capability. The code and pretrained models are publicly released.
📝 Abstract
In recent years, there has been increasing interest in developing foundation models for time series data that can generalize across diverse downstream tasks. While numerous forecasting-oriented foundation models have been introduced, there is a notable scarcity of models tailored for time series classification. To address this gap, we present Mantis, a new open-source foundation model for time series classification based on the Vision Transformer (ViT) architecture that has been pre-trained using a contrastive learning approach. Our experimental results show that Mantis outperforms existing foundation models both when the backbone is frozen and when fine-tuned, while achieving the lowest calibration error. In addition, we propose several adapters to handle the multivariate setting, reducing memory requirements and modeling channel interdependence.