🤖 AI Summary
Transformers face challenges in long-term time series forecasting, including high computational overhead and limited temporal modeling capability. This paper proposes FreezeTST—a lightweight hybrid architecture that alternately stacks frozen random feature modules with trainable Transformer layers, incorporating reservoir computing principles to introduce fixed stochastic dynamics for enhanced nonlinear memory. This design reduces trainable parameters by up to 90% and significantly shortens training time, while preserving the standard Transformer’s inference complexity. By synergistically integrating random feature mapping with self-attention, FreezeTST improves robustness in temporal modeling without increasing optimization difficulty. Evaluated on seven long-horizon forecasting benchmarks, FreezeTST matches or surpasses specialized models such as Informer and Autoformer, achieving an average 65% reduction in FLOPs.
📝 Abstract
Transformers are the de-facto choice for sequence modelling, yet their quadratic self-attention and weak temporal bias can make long-range forecasting both expensive and brittle. We introduce FreezeTST, a lightweight hybrid that interleaves frozen random-feature (reservoir) blocks with standard trainable Transformer layers. The frozen blocks endow the network with rich nonlinear memory at no optimisation cost; the trainable layers learn to query this memory through self-attention. The design cuts trainable parameters and also lowers wall-clock training time, while leaving inference complexity unchanged. On seven standard long-term forecasting benchmarks, FreezeTST consistently matches or surpasses specialised variants such as Informer, Autoformer, and PatchTST; with substantially lower compute. Our results show that embedding reservoir principles within Transformers offers a simple, principled route to efficient long-term time-series prediction.