Hadamard Attention Recurrent Transformer: A Strong Baseline for Stereo Matching Transformer

📅 2025-01-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low matching accuracy in weak-texture and specular regions, limited representational capacity of Transformers due to attention matrix low-rankness and quadratic complexity, insufficient focus on salient keypoints, and slow inference, this paper proposes an efficient and robust stereo matching method. We introduce a novel Hadamard-product attention mechanism that reduces computational complexity from quadratic to linear; design a Dense Attention Kernel (DAK) to enhance discriminability and mitigate low-rank degradation; propose a Multi-scale Kernel-Oriented Interaction (MKOI) module to jointly model spatial-channel dependencies via multi-scale convolutions; and adopt a recurrent Transformer architecture to improve feature reuse. Evaluated on the KITTI 2012 specular region benchmark, our method achieves state-of-the-art performance, significantly improving matching accuracy in weak-texture and highly specular scenes while maintaining real-time efficiency and strong modeling capability.

Technology Category

Application Category

📝 Abstract
In light of the advancements in transformer technology, extant research posits the construction of stereo transformers as a potential solution to the binocular stereo matching challenge. However, constrained by the low-rank bottleneck and quadratic complexity of attention mechanisms, stereo transformers still fail to demonstrate sufficient nonlinear expressiveness within a reasonable inference time. The lack of focus on key homonymous points renders the representations of such methods vulnerable to challenging conditions, including reflections and weak textures. Furthermore, a slow computing speed is not conducive to the application. To overcome these difficulties, we present the extbf{H}adamard extbf{A}ttention extbf{R}ecurrent Stereo extbf{T}ransformer (HART) that incorporates the following components: 1) For faster inference, we present a Hadamard product paradigm for the attention mechanism, achieving linear computational complexity. 2) We designed a Dense Attention Kernel (DAK) to amplify the differences between relevant and irrelevant feature responses. This allows HART to focus on important details. DAK also converts zero elements to non-zero elements to mitigate the reduced expressiveness caused by the low-rank bottleneck. 3) To compensate for the spatial and channel interaction missing in the Hadamard product, we propose MKOI to capture both global and local information through the interleaving of large and small kernel convolutions. Experimental results demonstrate the effectiveness of our HART. In reflective area, HART ranked extbf{1st} on the KITTI 2012 benchmark among all published methods at the time of submission. Code is available at url{https://github.com/ZYangChen/HART}.
Problem

Research questions and friction points this paper is trying to address.

Stereoscopic Image Matching
Accuracy Improvement
Processing Speed Optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

HART
Dense Attention Kernel (DAK)
MKOI Method
🔎 Similar Papers
No similar papers found.
Ziyang Chen
Ziyang Chen
Peking University
Quantum key distributionQuantum random number generation
Y
Yongjun Zhang
College of Computer Science, the State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China
W
Wenting Li
School of Information Engineering, Guizhou University of Commerce, Guiyang 550021, China
Bingshu Wang
Bingshu Wang
Fuzhou University
Deformation MechanismTextureEBSDMagnesium
Y
Yabo Wu
College of Computer Science, the State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China
Y
Yong Zhao
Key Laboratory of Integrated Microsystems, Shenzhen Graduate School, Peking University, Shenzhen 518055, China
C
C. L. Philip Chen
School of Computer Science and Engineering, South China University of Technology, Guangzhou 510641, China