Small Clips, Big Gains: Learning Long-Range Refocused Temporal Information for Video Super-Resolution

📅 2025-05-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Long-video video super-resolution (VSR) faces significant challenges in modeling long-range temporal dependencies. To address this, we propose LRTI-VSR, a novel training framework introducing the first “long-range refocusing” temporal training paradigm, enabling models to efficiently learn long-video dynamic correlations from short clips. Methodologically, we design an Intra&Inter-frame Transformer module that jointly integrates intra-frame and inter-frame attention with feed-forward network (FFN) enhancement, facilitating selective modeling and efficient fusion of temporal information. Additionally, we incorporate a recurrent feature reuse mechanism and a dual-branch CNN-Transformer architecture for synergistic feature learning. Evaluated on long-video VSR benchmarks, LRTI-VSR achieves state-of-the-art performance, significantly improving restoration fidelity—particularly in texture sharpness and motion detail—while maintaining high training efficiency and low computational overhead.

Technology Category

Application Category

📝 Abstract
Video super-resolution (VSR) can achieve better performance compared to single image super-resolution by additionally leveraging temporal information. In particular, the recurrent-based VSR model exploits long-range temporal information during inference and achieves superior detail restoration. However, effectively learning these long-term dependencies within long videos remains a key challenge. To address this, we propose LRTI-VSR, a novel training framework for recurrent VSR that efficiently leverages Long-Range Refocused Temporal Information. Our framework includes a generic training strategy that utilizes temporal propagation features from long video clips while training on shorter video clips. Additionally, we introduce a refocused intra&inter-frame transformer block which allows the VSR model to selectively prioritize useful temporal information through its attention module while further improving inter-frame information utilization in the FFN module. We evaluate LRTI-VSR on both CNN and transformer-based VSR architectures, conducting extensive ablation studies to validate the contribution of each component. Experiments on long-video test sets demonstrate that LRTI-VSR achieves state-of-the-art performance while maintaining training and computational efficiency.
Problem

Research questions and friction points this paper is trying to address.

Learning long-range temporal dependencies in video super-resolution
Effectively utilizing temporal information from long videos
Improving inter-frame information utilization in VSR models
Innovation

Methods, ideas, or system contributions that make the work stand out.

LRTI-VSR framework for recurrent VSR training
Refocused intra&inter-frame transformer block
Training on short clips with long-range features
🔎 Similar Papers
No similar papers found.
X
Xingyu Zhou
University of Electronic Science and Technology of China
W
Wei Long
University of Electronic Science and Technology of China
Jingbo Lu
Jingbo Lu
University of Electronic Science and Technology of China
image compression
S
Shiyin Jiang
University of Electronic Science and Technology of China
Weiyi You
Weiyi You
University of Electronic Science and Technology of China
visual super-resolutionvisual generation
H
Haifeng Wu
University of Electronic Science and Technology of China
Shuhang Gu
Shuhang Gu
University of Electronic Science and Technology of China
image processingpattern recognitioncomputer vision