Efficient Time Series Processing for Transformers and State-Space Models through Token Merging

📅 2024-05-28

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

193K/year

🤖 AI Summary

To address the high computational overhead of Transformers and state-space models (SSMs) when modeling long time series, this paper introduces token merging—previously unexplored in time-series analysis—for the first time. We propose a domain-adapted local merging paradigm that enforces subsequence neighborhood constraints, employs linear weighted aggregation, and integrates lightweight attention/SSM modules to jointly preserve local dependency modeling and computational efficiency. Our method achieves up to 5400% inference speedup on state-of-the-art time-series foundation models (e.g., Chronos), with negligible accuracy degradation. It demonstrates robust performance across diverse architectures and benchmark datasets, significantly improving throughput for long sequences. This work establishes a novel pathway toward efficient deployment of large-scale time-series models.

Technology Category

Application Category

📝 Abstract

Transformer architectures have shown promising results in time series processing. However, despite recent advances in subquadratic attention mechanisms or state-space models, processing very long sequences still imposes significant computational requirements. Token merging, which involves replacing multiple tokens with a single one calculated as their linear combination, has shown to considerably improve the throughput of vision transformer architectures while maintaining accuracy. In this work, we go beyond computer vision and perform the first investigations of token merging in time series analysis on both time series transformers and state-space models. To effectively scale token merging to long sequences, we introduce local merging, a domain-specific token merging algorithm that selectively combines tokens within a local neighborhood, adjusting the computational complexity from linear to quadratic based on the neighborhood size. Our comprehensive empirical evaluation demonstrates that token merging offers substantial computational benefits with minimal impact on accuracy across various models and datasets. On the recently proposed Chronos foundation model, we achieve accelerations up to 5400% with only minor accuracy degradations.

Problem

Research questions and friction points this paper is trying to address.

Efficient processing of long token sequences in time series analysis

Developing local merging for scalable and causal token reduction

Predicting merging benefits via spectral properties without task evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Local merging algorithm for time series

Adjustable computational complexity scaling

Spectral properties predict merging benefits

🔎 Similar Papers

Multiple-Resolution Tokenization for Time Series Forecasting with an Application to Pricing