Investigating Compositional Reasoning in Time Series Foundation Models

📅 2025-02-09

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This paper investigates whether large-scale time series foundation models (TSFMs) possess genuine compositional reasoning capabilities—beyond mere memorization of training patterns. Method: We formally define compositional reasoning in time series forecasting and introduce a novel evaluation framework combining synthetic and real-world data. Leveraging language-modeling-inspired compositional generalization tests and controlled ablation studies, we systematically assess 23 TSFMs, quantifying the impact of architectural components (e.g., tokenization, attention mechanisms, residual structures). Contribution/Results: Patch-based Transformers exhibit the strongest compositional reasoning. A lightweight residual MLP achieves performance comparable to top performers—using 97% fewer FLOPs and 86% fewer parameters. Empirical results demonstrate that certain TSFMs exhibit true out-of-distribution (OOD) zero-shot forecasting capability, significantly outperforming classical statistical methods. Crucially, we identify tokenization as a key design choice with a substantial negative impact on reasoning ability.

Technology Category

Application Category

📝 Abstract

Large pre-trained time series foundation models (TSFMs) have demonstrated promising zero-shot performance across a wide range of domains. However, a question remains: Do TSFMs succeed solely by memorizing training patterns, or do they possess the ability to reason? While reasoning is a topic of great interest in the study of Large Language Models (LLMs), it is undefined and largely unexplored in the context of TSFMs. In this work, inspired by language modeling literature, we formally define compositional reasoning in forecasting and distinguish it from in-distribution generalization. We evaluate the reasoning and generalization capabilities of 23 popular deep learning forecasting models on multiple synthetic and real-world datasets. Additionally, through controlled studies, we systematically examine which design choices in TSFMs contribute to improved reasoning abilities. Our study yields key insights into the impact of TSFM architecture design on compositional reasoning and generalization. We find that patch-based Transformers have the best reasoning performance, closely followed by residualized MLP-based architectures, which are 97% less computationally complex in terms of FLOPs and 86% smaller in terms of the number of trainable parameters. Interestingly, in some zero-shot out-of-distribution scenarios, these models can outperform moving average and exponential smoothing statistical baselines trained on in-distribution data. Only a few design choices, such as the tokenization method, had a significant (negative) impact on Transformer model performance.

Problem

Research questions and friction points this paper is trying to address.

Defining compositional reasoning in time series forecasting.

Evaluating reasoning and generalization in 23 forecasting models.

Identifying design choices that enhance TSFM reasoning abilities.

Innovation

Methods, ideas, or system contributions that make the work stand out.

patch-based Transformers

residualized MLP-based architectures

tokenization method impact

🔎 Similar Papers

Multi-Step Time Series Inference Agent for Reasoning and Automated Task Execution