Quantifying the Pre-training Dividend: Generative versus Latent Self-Supervised Learning for Time Series Foundation Models

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This study addresses the lack of systematic, quantitative evaluation of the practical benefits conferred by self-supervised pretraining on time series across diverse downstream tasks. The authors construct a controlled evaluation framework to systematically compare generative and latent alignment–based approaches, introducing a novel data augmentation strategy based on the discrete wavelet transform (DWT) to enhance invariance to local perturbations. They provide the first quantification of the asymmetry in “pretraining gains,” revealing that representation utility is governed by a trade-off between signal resolution required by the task and the precision–invariance balance inherent in the learning objective. Representation quality is found to be independent of data provenance and saturates at moderate model depths. Experiments demonstrate pretraining improvements of up to 375% on anomaly detection and classification tasks, yet limited gains in forecasting, while also validating the efficacy of scaling models with large-scale synthetic data.

📝 Abstract

The success of self-supervised learning (SSL) in vision and NLP has motivated its rapid adoption for time series. However, research has focused primarily on Generative paradigms and forecasting tasks, leaving the broader utility of learned representations unquantified. We establish a controlled framework to evaluate the "pre-training dividend": the value added by SSL across diverse temporal tasks. We systematically compare Generative paradigms against Latent Alignment architectures, introducing adaptations of LeJEPA and DINO for time series. These adaptations utilize Discrete Wavelet Transform (DWT) augmentations to enforce invariance to local fluctuations. Our analysis reveals that the pre-training dividend is highly asymmetric: SSL yields gains of up to 375% for anomaly detection and classification, yet remains marginal for forecasting. We demonstrate that representational utility is non-universal, governed by a precision-invariance trade-off where the specific signal resolution required by the task must align with the objective. Finally, we show that representation quality is largely independent of data origin and saturates at moderate architectural depths, suggesting a path to scaling via massive synthetic generation. Our code is available at: https://github.com/noammajor/Models

Problem

Research questions and friction points this paper is trying to address.

self-supervised learning

time series

pre-training dividend

representation utility

forecasting

Innovation

Methods, ideas, or system contributions that make the work stand out.

self-supervised learning

time series foundation models

latent alignment