🤖 AI Summary
This work addresses the limitation of conventional Transformers, which rely on fixed positional encodings and struggle to model non-uniform, time-warped dynamical processes. The authors propose Symplectic Positional Encoding (SyPE), which, for the first time, incorporates the symplectic group Sp(2,ℝ) into positional encoding, rigorously generalizing Rotary Positional Embeddings (RoPE) to support modeling of non-affine time warping. Furthermore, they introduce an input-dependent adaptive time-warping module that enables the attention mechanism to learn local, time-varying temporal rhythms in an end-to-end manner. Evaluated on standard multivariate time series forecasting benchmarks, the proposed method achieves state-of-the-art performance and demonstrates enhanced robustness, particularly on data exhibiting non-stationary temporal dynamics.
📝 Abstract
Transformer architectures have established strong baselines in time series forecasting, yet they typically rely on positional encodings that assume uniform, index-based temporal progression. However, real-world systems, from shifting financial cycles to elastic biological rhythms, frequently exhibit"time-warped"dynamics where the effective flow of time decouples from the sampling index. In this work, we first formalize this misalignment and prove that rotary position embedding (RoPE) is mathematically incapable of representing non-affine temporal warping. To address this, we propose Symplectic Positional Embeddings (SyPE), a learnable encoding framework derived from Hamiltonian mechanics. SyPE strictly generalizes RoPE by extending the rotation group $\mathrm{SO}(2)$ to the symplectic group $\mathrm{Sp}(2,\mathbb{R})$, modulated by a novel input-dependent adaptive warp module. By allowing the attention mechanism to adaptively dilate or contract temporal coordinates end-to-end, our approach captures locally varying periodicities without requiring pre-defined warping functions. We implement this mechanism in StretchTime, a multivariate forecasting architecture that achieves state-of-the-art performance on standard benchmarks, demonstrating superior robustness on datasets exhibiting non-stationary temporal dynamics.