Efficient Multi-disparity Transformer for Light Field Image Super-resolution

📅 2024-07-22
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing light field (LF) super-resolution methods uniformly model sub-aperture images (SAIs), leading to parallax entanglement and computational redundancy. To address this, we propose the Multi-scale Disparity Transformer (MDT), the first disparity-aware “divide-and-conquer” Transformer architecture for LF SR. MDT explicitly disentangles parallax by employing a multi-branch structure and a novel Disparity-Sensitive Attention (DSA) mechanism that models distinct disparity ranges separately. Integrated with SAI-wise collaborative modeling and multi-scale feature fusion, it forms the lightweight LF-MDTNet. On 2× and 4× LF SR tasks, LF-MDTNet achieves PSNR gains of +0.37 dB and +0.41 dB over state-of-the-art methods, respectively, while reducing model parameters by 23% and accelerating inference by 1.8×. The approach thus advances accuracy, efficiency, and interpretability—enabling explicit parallax-aware representation learning in LF super-resolution.

Technology Category

Application Category

📝 Abstract
This paper presents the Multi-scale Disparity Transformer (MDT), a novel Transformer tailored for light field image super-resolution (LFSR) that addresses the issues of computational redundancy and disparity entanglement caused by the indiscriminate processing of sub-aperture images inherent in conventional methods. MDT features a multi-branch structure, with each branch utilising independent disparity self-attention (DSA) to target specific disparity ranges, effectively reducing computational complexity and disentangling disparities. Building on this architecture, we present LF-MDTNet, an efficient LFSR network. Experimental results demonstrate that LF-MDTNet outperforms existing state-of-the-art methods by 0.37 dB and 0.41 dB PSNR at the 2x and 4x scales, achieving superior performance with fewer parameters and higher speed.
Problem

Research questions and friction points this paper is trying to address.

Addresses data redundancy in light field images
Resolves disparity entanglement in image processing
Improves efficiency in light field super-resolution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Skim Transformer with multi-branch structure
Attention score matrix on skimmed SAI subsets
Disparity-aware SkimLFSR for efficient super-resolution
Zeke Zexi Hu
Zeke Zexi Hu
University of Sydney
Computer VisionDeep LearningMachine Learning
H
Haodong Chen
The School of Computer Science, University of Sydney, Darlington, NSW, Australia
Y
Yuk Ying Chung
The School of Computer Science, University of Sydney, Darlington, NSW, Australia
X
Xiaoming Chen
The School of Computer and Artificial Intelligence, Beijing Technology and Business University, Beijing, China