Efficient Time Series Processing for Transformers and State-Space Models through Token Merging

📅 2024-05-28
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational overhead of Transformers and state-space models (SSMs) when modeling long time series, this paper introduces token merging—previously unexplored in time-series analysis—for the first time. We propose a domain-adapted local merging paradigm that enforces subsequence neighborhood constraints, employs linear weighted aggregation, and integrates lightweight attention/SSM modules to jointly preserve local dependency modeling and computational efficiency. Our method achieves up to 5400% inference speedup on state-of-the-art time-series foundation models (e.g., Chronos), with negligible accuracy degradation. It demonstrates robust performance across diverse architectures and benchmark datasets, significantly improving throughput for long sequences. This work establishes a novel pathway toward efficient deployment of large-scale time-series models.

Technology Category

Application Category

📝 Abstract
Transformer architectures have shown promising results in time series processing. However, despite recent advances in subquadratic attention mechanisms or state-space models, processing very long sequences still imposes significant computational requirements. Token merging, which involves replacing multiple tokens with a single one calculated as their linear combination, has shown to considerably improve the throughput of vision transformer architectures while maintaining accuracy. In this work, we go beyond computer vision and perform the first investigations of token merging in time series analysis on both time series transformers and state-space models. To effectively scale token merging to long sequences, we introduce local merging, a domain-specific token merging algorithm that selectively combines tokens within a local neighborhood, adjusting the computational complexity from linear to quadratic based on the neighborhood size. Our comprehensive empirical evaluation demonstrates that token merging offers substantial computational benefits with minimal impact on accuracy across various models and datasets. On the recently proposed Chronos foundation model, we achieve accelerations up to 5400% with only minor accuracy degradations.
Problem

Research questions and friction points this paper is trying to address.

Efficient processing of long token sequences in time series analysis
Developing local merging for scalable and causal token reduction
Predicting merging benefits via spectral properties without task evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Local merging algorithm for time series
Adjustable computational complexity scaling
Spectral properties predict merging benefits