RPT-SR: Regional Prior attention Transformer for infrared image Super-Resolution

📅 2026-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing general-purpose super-resolution models struggle to effectively leverage stable spatial scene priors in fixed-view infrared imaging, limiting their performance. To address this, this work proposes a novel Vision Transformer architecture that introduces, for the first time in infrared image super-resolution, an explicit regional prior attention mechanism. The design employs dual tokens—learnable regional prior tokens and local content tokens—to jointly model global structural memory and fine-grained local details. The proposed method demonstrates strong performance across both long-wave (LWIR) and short-wave (SWIR) infrared bands, achieving state-of-the-art results on multiple infrared datasets and confirming its generality and effectiveness.

Technology Category

Application Category

📝 Abstract
General-purpose super-resolution models, particularly Vision Transformers, have achieved remarkable success but exhibit fundamental inefficiencies in common infrared imaging scenarios like surveillance and autonomous driving, which operate from fixed or nearly-static viewpoints. These models fail to exploit the strong, persistent spatial priors inherent in such scenes, leading to redundant learning and suboptimal performance. To address this, we propose the Regional Prior attention Transformer for infrared image Super-Resolution (RPT-SR), a novel architecture that explicitly encodes scene layout information into the attention mechanism. Our core contribution is a dual-token framework that fuses (1) learnable, regional prior tokens, which act as a persistent memory for the scene's global structure, with (2) local tokens that capture the frame-specific content of the current input. By utilizing these tokens into an attention, our model allows the priors to dynamically modulate the local reconstruction process. Extensive experiments validate our approach. While most prior works focus on a single infrared band, we demonstrate the broad applicability and versatility of RPT-SR by establishing new state-of-the-art performance across diverse datasets covering both Long-Wave (LWIR) and Short-Wave (SWIR) spectra
Problem

Research questions and friction points this paper is trying to address.

infrared image super-resolution
spatial priors
fixed viewpoint
Vision Transformers
scene layout
Innovation

Methods, ideas, or system contributions that make the work stand out.

Regional Prior Tokens
Infrared Image Super-Resolution
Vision Transformer
Scene Layout Encoding
Dual-Token Framework
🔎 Similar Papers
No similar papers found.
Y
Youngwan Jin
Yonsei University
I
Incheol Park
Yonsei University
Y
Yagiz Nalcakan
Yonsei University
H
Hyeongjin Ju
Yonsei University
S
Sanghyeop Yeo
Yonsei University
Shiho Kim
Shiho Kim
School of Integrated Technology, Yonsei University
Intelligent semiconductorsIntelligent VehiclesArtificial IntelligenceQML