🤖 AI Summary
Existing online video super-resolution (VSR) methods predominantly rely on single-frame forward alignment, limiting their ability to model long-range temporal dependencies. To address this, we propose Traj-SSM, a trajectory-aware shift state space model that explicitly models motion trajectories to aggregate semantically similar feature tokens from historical frames, enabling efficient spatiotemporal fusion. Our approach innovatively integrates Hilbert curve scanning, learnable shift operations, and trajectory-guided attention aggregation into the Mamba architecture—enhancing both spatial continuity and temporal consistency—while introducing a trajectory-aware loss to optimize token selection. Evaluated on REDS4, Vid4, and UDM10 benchmarks, Traj-SSM outperforms six state-of-the-art methods across most metrics, achieving new leading performance while reducing computational cost by 22.7% in MACs.
📝 Abstract
Online video super-resolution (VSR) is an important technique for many real-world video processing applications, which aims to restore the current high-resolution video frame based on temporally previous frames. Most of the existing online VSR methods solely employ one neighboring previous frame to achieve temporal alignment, which limits long-range temporal modeling of videos. Recently, state space models (SSMs) have been proposed with linear computational complexity and a global receptive field, which significantly improve computational efficiency and performance. In this context, this paper presents a novel online VSR method based on Trajectory-aware Shifted SSMs (TS-Mamba), leveraging both long-term trajectory modeling and low-complexity Mamba to achieve efficient spatio-temporal information aggregation. Specifically, TS-Mamba first constructs the trajectories within a video to select the most similar tokens from the previous frames. Then, a Trajectory-aware Shifted Mamba Aggregation (TSMA) module consisting of proposed shifted SSMs blocks is employed to aggregate the selected tokens. The shifted SSMs blocks are designed based on Hilbert scannings and corresponding shift operations to compensate for scanning losses and strengthen the spatial continuity of Mamba. Additionally, we propose a trajectory-aware loss function to supervise the trajectory generation, ensuring the accuracy of token selection when training our model. Extensive experiments on three widely used VSR test datasets demonstrate that compared with six online VSR benchmark models, our TS-Mamba achieves state-of-the-art performance in most cases and over 22.7% complexity reduction (in MACs). The source code for TS-Mamba will be available at https://github.com.