Trajectory-aware Shifted State Space Models for Online Video Super-Resolution

📅 2025-08-14

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Existing online video super-resolution (VSR) methods predominantly rely on single-frame forward alignment, limiting their ability to model long-range temporal dependencies. To address this, we propose Traj-SSM, a trajectory-aware shift state space model that explicitly models motion trajectories to aggregate semantically similar feature tokens from historical frames, enabling efficient spatiotemporal fusion. Our approach innovatively integrates Hilbert curve scanning, learnable shift operations, and trajectory-guided attention aggregation into the Mamba architecture—enhancing both spatial continuity and temporal consistency—while introducing a trajectory-aware loss to optimize token selection. Evaluated on REDS4, Vid4, and UDM10 benchmarks, Traj-SSM outperforms six state-of-the-art methods across most metrics, achieving new leading performance while reducing computational cost by 22.7% in MACs.

Technology Category

Application Category

📝 Abstract

Online video super-resolution (VSR) is an important technique for many real-world video processing applications, which aims to restore the current high-resolution video frame based on temporally previous frames. Most of the existing online VSR methods solely employ one neighboring previous frame to achieve temporal alignment, which limits long-range temporal modeling of videos. Recently, state space models (SSMs) have been proposed with linear computational complexity and a global receptive field, which significantly improve computational efficiency and performance. In this context, this paper presents a novel online VSR method based on Trajectory-aware Shifted SSMs (TS-Mamba), leveraging both long-term trajectory modeling and low-complexity Mamba to achieve efficient spatio-temporal information aggregation. Specifically, TS-Mamba first constructs the trajectories within a video to select the most similar tokens from the previous frames. Then, a Trajectory-aware Shifted Mamba Aggregation (TSMA) module consisting of proposed shifted SSMs blocks is employed to aggregate the selected tokens. The shifted SSMs blocks are designed based on Hilbert scannings and corresponding shift operations to compensate for scanning losses and strengthen the spatial continuity of Mamba. Additionally, we propose a trajectory-aware loss function to supervise the trajectory generation, ensuring the accuracy of token selection when training our model. Extensive experiments on three widely used VSR test datasets demonstrate that compared with six online VSR benchmark models, our TS-Mamba achieves state-of-the-art performance in most cases and over 22.7% complexity reduction (in MACs). The source code for TS-Mamba will be available at https://github.com.

Problem

Research questions and friction points this paper is trying to address.

Online video super-resolution with long-range temporal modeling

Achieving efficient spatio-temporal information aggregation

Compensating scanning losses while maintaining spatial continuity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Trajectory-aware Shifted Mamba for aggregation

Hilbert scanning blocks for spatial continuity

Trajectory-aware loss for accurate token selection

🔎 Similar Papers

No similar papers found.