🤖 AI Summary
Pretraining time-series foundation models (TSFMs) relies heavily on large-scale real-world data, incurring prohibitive computational costs. Method: We propose CauKer, a synthetic data generation framework that integrates Gaussian process kernel composition with structural causal modeling to efficiently produce diverse time-series data exhibiting authentic trends, seasonality, and nonlinear causal interactions. Contribution/Results: CauKer enables the first efficient pretraining of classification-oriented TSFMs using *only* synthetic data. We identify stable, predictable scaling laws—both in data volume and model size—distinct from the irregular scaling observed with real data. Across multiple architectures and pretraining paradigms, models pretrained exclusively on synthetic data achieve zero-shot classification performance on par with or exceeding those pretrained on real data. This demonstrates the feasibility, effectiveness, and scalability of synthetic-data-driven pretraining for TSFMs.
📝 Abstract
Time series foundation models (TSFMs) have recently gained significant attention due to their strong zero-shot capabilities and widespread real-world applications. Such models typically require a computationally costly pretraining on large-scale, carefully curated collections of real-world sequences. To allow for a sample-efficient pretraining of TSFMs, we propose CauKer, a novel algorithm designed to generate diverse, causally coherent synthetic time series with realistic trends, seasonality, and nonlinear interactions. CauKer combines Gaussian Process (GP) kernel composition with Structural Causal Models (SCM) to produce data for sample-efficient pretraining of state-of-the-art classification TSFMs having different architectures and following different pretraining approaches. Additionally, our experiments reveal that CauKer-generated datasets exhibit clear scaling laws for both dataset size (10K to 10M samples) and model capacity (1M to 783M parameters), unlike real-world datasets, which display irregular scaling behavior.