Parallax to Align Them All: An OmniParallax Attention Mechanism for Distributed Multi-View Image Compression

πŸ“… 2026-03-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitation of existing distributed multiview image compression methods, which fail to adequately model the varying inter-view correlations during decoding, thereby constraining performance. To overcome this, we propose ParaHydra, an end-to-end framework featuring the OmniParallax attention mechanism that adaptively captures disparity-based correlations between arbitrary view pairs. Additionally, we introduce a Parallax Multi-Information Fusion module to efficiently integrate multisource information within both the decoder and the entropy model. Our approach is the first to significantly outperform state-of-the-art multiview encoders under a distributed setting, achieving bitrate savings of 19.72% and 24.18% on the WildTrack(3) and WildTrack(6) datasets, respectively, while accelerating encoding and decoding by up to 34Γ— and 65Γ—.

Technology Category

Application Category

πŸ“ Abstract
Multi-view image compression (MIC) aims to achieve high compression efficiency by exploiting inter-image correlations, playing a crucial role in 3D applications. As a subfield of MIC, distributed multi-view image compression (DMIC) offers performance comparable to MIC while eliminating the need for inter-view information at the encoder side. However, existing methods in DMIC typically treat all images equally, overlooking the varying degrees of correlation between different views during decoding, which leads to suboptimal coding performance. To address this limitation, we propose a novel $\textbf{OmniParallax Attention Mechanism}$ (OPAM), which is a general mechanism for explicitly modeling correlations and aligned features between arbitrary pairs of information sources. Building upon OPAM, we propose a Parallax Multi Information Fusion Module (PMIFM) to adaptively integrate information from different sources. PMIFM is incorporated into both the joint decoder and the entropy model to construct our end-to-end DMIC framework, $\textbf{ParaHydra}$. Extensive experiments demonstrate that $\textbf{ParaHydra}$ is $\textbf{the first DMIC method}$ to significantly surpass state-of-the-art MIC codecs, while maintaining low computational overhead. Performance gains become more pronounced as the number of input views increases. Compared with LDMIC, $\textbf{ParaHydra}$ achieves bitrate savings of $\textbf{19.72%}$ on WildTrack(3) and up to $\textbf{24.18%}$ on WildTrack(6), while significantly improving coding efficiency (as much as $\textbf{65}\times$ in decoding and $\textbf{34}\times$ in encoding).
Problem

Research questions and friction points this paper is trying to address.

Distributed Multi-View Image Compression
Inter-view Correlation
Compression Efficiency
Multi-view Image Compression
Decoding Optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

OmniParallax Attention
Distributed Multi-View Image Compression
Parallax Multi Information Fusion
End-to-End DMIC
View Correlation Modeling
πŸ”Ž Similar Papers
No similar papers found.
Haotian Zhang
Haotian Zhang
University of Science and Technology of China
Educational Data Mining
F
Feiyue Long
The National Engineering Laboratory for Video Technology, School of Computer Science, Peking University
Y
Yixin Yu
The National Engineering Laboratory for Video Technology, School of Computer Science, Peking University
Jian Xue
Jian Xue
Professor of Computer Applied Technology, University of Chinese Academy of Sciences
Image ProcessingComputer GraphicsVisualization
H
Haocheng Tang
The National Engineering Laboratory for Video Technology, School of Computer Science, Peking University
Tongda Xu
Tongda Xu
Phd candidate, Tsinghua University
image & video compressionperceptual qualityθ€εŒ—δΊ¬ & 网吧倧η₯ž
Zhenning Shi
Zhenning Shi
Huawei Tech. Ltd.
5GNFVSDNcloud RANMCPTT
Yan Wang
Yan Wang
Tsinghua university; SenseTime
Neural CompressionComputer VisionMachine Learning
Siwei Ma
Siwei Ma
Peking University
Video Coding and Processing
J
Jiaqi Zhang
The National Engineering Laboratory for Video Technology, School of Computer Science, Peking University