🤖 AI Summary
Transformer-based time series models face a trade-off between computational efficiency and information fidelity under fixed chunking strategies. This work proposes a content-aware dynamic chunking mechanism that adaptively adjusts chunk boundaries based on local signal complexity, enabling fine-grained representation in information-dense regions and coarse-grained aggregation in smooth segments. The approach integrates a lightweight state-space encoder with a dynamic chunking algorithm to deliver compressed yet informative temporal inputs to the Transformer. During large-scale pretraining, the model achieves up to 20× faster convergence and 8× improved data efficiency compared to baseline methods, while setting new state-of-the-art results on long-horizon forecasting benchmarks.
📝 Abstract
Transformer-based time series foundation models face a fundamental trade-off in choice of tokenization: point-wise embeddings preserve temporal fidelity but scale poorly with sequence length, whereas fixed-length patching improves efficiency by imposing uniform boundaries that may disrupt natural transitions and blur informative local dynamics. In order to address these limitations, we introduce TimeSqueeze, a dynamic patching mechanism that adaptively selects patch boundaries within each sequence based on local signal complexity. TimeSqueeze first applies a lightweight state-space encoder to extract full-resolution point-wise features, then performs content-aware segmentation by allocating short patches to information-dense regions and long patches to smooth or redundant segments. This variable-resolution compression preserves critical temporal structure while substantially reducing the token sequence presented to the Transformer backbone. Specifically for large-scale pretraining, TimeSqueeze attains up to 20x faster convergence and 8x higher data efficiency compared to equivalent point-token baselines. Experiments across long-horizon forecasting benchmarks show that TimeSqueeze consistently outperforms comparable architectures that use either point-wise tokenization or fixed-size patching.