TimeSqueeze: Dynamic Patching for Efficient Time Series Forecasting

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Transformer-based time series models face a trade-off between computational efficiency and information fidelity under fixed chunking strategies. This work proposes a content-aware dynamic chunking mechanism that adaptively adjusts chunk boundaries based on local signal complexity, enabling fine-grained representation in information-dense regions and coarse-grained aggregation in smooth segments. The approach integrates a lightweight state-space encoder with a dynamic chunking algorithm to deliver compressed yet informative temporal inputs to the Transformer. During large-scale pretraining, the model achieves up to 20× faster convergence and 8× improved data efficiency compared to baseline methods, while setting new state-of-the-art results on long-horizon forecasting benchmarks.

Technology Category

Application Category

📝 Abstract

Transformer-based time series foundation models face a fundamental trade-off in choice of tokenization: point-wise embeddings preserve temporal fidelity but scale poorly with sequence length, whereas fixed-length patching improves efficiency by imposing uniform boundaries that may disrupt natural transitions and blur informative local dynamics. In order to address these limitations, we introduce TimeSqueeze, a dynamic patching mechanism that adaptively selects patch boundaries within each sequence based on local signal complexity. TimeSqueeze first applies a lightweight state-space encoder to extract full-resolution point-wise features, then performs content-aware segmentation by allocating short patches to information-dense regions and long patches to smooth or redundant segments. This variable-resolution compression preserves critical temporal structure while substantially reducing the token sequence presented to the Transformer backbone. Specifically for large-scale pretraining, TimeSqueeze attains up to 20x faster convergence and 8x higher data efficiency compared to equivalent point-token baselines. Experiments across long-horizon forecasting benchmarks show that TimeSqueeze consistently outperforms comparable architectures that use either point-wise tokenization or fixed-size patching.

Problem

Research questions and friction points this paper is trying to address.

time series forecasting

tokenization

patching

temporal fidelity

sequence length scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic patching

time series forecasting

Transformer efficiency