Beyond Subspace Isolation: Many-to-Many Transformer for Light Field Image Super-resolution

๐Ÿ“… 2024-01-01
๐Ÿ›๏ธ IEEE transactions on multimedia
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing light field super-resolution methods decompose 4D data into lower-dimensional subspaces for separate processing, restricting self-attention to local sub-apertures and impeding global spatial-angular dependency modelingโ€”a limitation we term โ€œsubspace isolation.โ€ To address this, we propose the Many-to-Many Transformer (M2MT), which first aggregates angular information in the spatial domain and then applies full-sub-aperture many-to-many self-attention, breaking the conventional one-to-one local constraint to enable long-range joint spatial-angular modeling. Our method integrates a lightweight spatial feature aggregation module with an enhanced Transformer architecture and introduces a Local Attribution Map (LAM) to improve interpretability. Evaluated on multiple benchmark datasets, M2MT achieves state-of-the-art performance. Ablation and LAM analysis confirm its capability to effectively capture cross-dimensional global correlations, significantly improving reconstruction accuracy and structural fidelity.

Technology Category

Application Category

๐Ÿ“ Abstract
The effective extraction of spatial-angular features plays a crucial role in light field image super-resolution (LFSR) tasks, and the introduction of convolution and Transformers leads to significant improvement in this area. Nevertheless, due to the large 4D data volume of light field images, many existing methods opted to decompose the data into a number of lower-dimensional subspaces and perform Transformers in each sub-space individually. As a side effect, these methods inadvertently restrict the self-attention mechanisms to a One-to-One scheme accessing only a limited subset of LF data, explicitly preventing comprehensive optimization on all spatial and angular cues. In this paper, we identify this limitation as subspace isolation and introduce a novel Many-to-Many Transformer (M2MT) to address it. M2MT aggregates angular information in the spatial subspace before performing the self-attention mechanism. It enables complete access to all information across all sub-aperture images (SAIs) in a light field image. Consequently, M2MT is enabled to comprehensively capture long-range correlation dependencies. With M2MT as the pivotal component, we develop a simple yet effective M2MT network for LFSR. Our experimental results demonstrate that M2MT achieves state-of-the-art performance across various public datasets. We further conduct in-depth analysis using local attribution maps (LAM) to obtain visual interpretability, and the results validate that M2MT is empowered with a truly non-local context in both spatial and angular subspaces to mitigate subspace isolation and acquire effective spatial-angular representation.
Problem

Research questions and friction points this paper is trying to address.

Addresses subspace isolation in light field image super-resolution.
Introduces Many-to-Many Transformer for comprehensive spatial-angular feature extraction.
Enables full access to all sub-aperture images for better optimization.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Many-to-Many Transformer for comprehensive data access
Aggregates angular information before self-attention mechanism
Achieves state-of-the-art light field image super-resolution
Zeke Zexi Hu
Zeke Zexi Hu
University of Sydney
Computer VisionDeep LearningMachine Learning
X
Xiaoming Chen
School of Computer Science and Engineering, Beijing Technology and Business University, Beijing 102488, China
Y
Yuk Ying Chung
School of Computer Science, University of Sydney, Darlington, NSW 2008, Australia
Yiran Shen
Yiran Shen
School of Software, Shandong University
Mobile computingVirtual reality