🤖 AI Summary
This work addresses the high computational complexity of self-attention in Transformers for long-term time series forecasting and the mismatch between fixed-length tokenization and the intrinsic semantic structure of time series data. To this end, the authors propose a parameter-free, B-spline-based adaptive tokenization method that dynamically generates non-uniform tokens in high-curvature regions by fitting B-spline curves, thereby yielding a semantics-aware compressed representation. Furthermore, they introduce a hybrid positional encoding scheme combining learnable embeddings with a layer-wise learnable basis for L-RoPE, enhancing the model’s capacity to capture multi-scale temporal dependencies. Experimental results demonstrate that the proposed approach achieves competitive performance across multiple public benchmarks, with particularly strong results under high compression ratios and memory-constrained settings.
📝 Abstract
Long-term time series forecasting using transformers is hampered by the quadratic complexity of self-attention and the rigidity of uniform patching, which may be misaligned with the data's semantic structure. In this paper, we introduce the \textit{B-Spline Adaptive Tokenizer (BSAT)}, a novel, parameter-free method that adaptively segments a time series by fitting it with B-splines. BSAT algorithmically places tokens in high-curvature regions and represents each variable-length basis function as a fixed-size token, composed of its coefficient and position. Further, we propose a hybrid positional encoding that combines a additive learnable positional encoding with Rotary Positional Embedding featuring a layer-wise learnable base: L-RoPE. This allows each layer to attend to different temporal dependencies. Our experiments on several public benchmarks show that our model is competitive with strong performance at high compression rates. This makes it particularly well-suited for use cases with strong memory constraints.