Efficient State Space Model via Fast Tensor Convolution and Block Diagonalization

πŸ“… 2024-02-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the high parameter count and low computational efficiency of state space models (SSMs) in long-sequence modeling, this paper proposes the efficient SSM (eSSM) layer. Leveraging the convolutional representation of multi-input multi-output SSMs, eSSM integrates system matrix diagonalization, FFT-accelerated fast tensor convolution, and a novel block-diagonalization strategy to jointly optimize both parameter count and computational cost. Empirically, eSSM retains modeling capability comparable to S4 while reducing parameters to just 12.89% of LSTM’s and 13.24% of Mamba’s. It matches S4’s performance across multiple benchmarks and significantly outperforms Transformer and LSTM baselines. Moreover, eSSM achieves 3.94Γ— faster training than LSTM and 1.35Γ— faster than Mamba. The implementation is publicly available.

Technology Category

Application Category

πŸ“ Abstract
Existing models encounter bottlenecks in balancing performance and computational efficiency when modeling long sequences. Although the state space model (SSM) has achieved remarkable success in handling long sequence tasks, it still faces the problem of large number of parameters. In order to further improve the efficiency of SSM, we propose a new state space layer based on multiple-input multiple-output SSM, called efficient SSM (eSSM). Our eSSM is built on the convolutional representation of multi-input and multi-input (MIMO) SSM. We propose a variety of effective strategies to improve the computational efficiency. The diagonalization of the system matrix first decouples the original system. Then a fast tensor convolution is proposed based on the fast Fourier transform. In addition, the block diagonalization of the SSM further reduces the model parameters and improves the model flexibility. Extensive experimental results show that the performance of the proposed model on multiple databases matches the performance of state-of-the-art models, such as S4, and is significantly better than Transformers and LSTM. In the model efficiency benchmark, the parameters of eSSM are only 12.89% of LSTM and 13.24% of Mamba. The training speed of eSSM is 3.94 times faster than LSTM and 1.35 times faster than Mamba. Code is available at: href{https://github.com/leonty1/essm}{https://github.com/leonty1/essm}.
Problem

Research questions and friction points this paper is trying to address.

Improves efficiency of state space models for long sequences
Reduces parameters in MIMO SSM via block diagonalization
Enhances computational speed with fast tensor convolution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fast tensor convolution via Fourier transform
Block diagonalization reduces parameters
MIMO SSM enhances computational efficiency
πŸ”Ž Similar Papers
No similar papers found.
T
Tongyi Liang
Department of Systems Engineering, City University of Hong Kong, Hong Kong, SAR, China
H
Han-Xiong Li
Department of Systems Engineering, City University of Hong Kong, Hong Kong, SAR, China