CMHANet: A Cross-Modal Hybrid Attention Network for Point Cloud Registration

📅 2026-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing point cloud registration methods suffer significant performance degradation under real-world challenges such as data incompleteness, noise corruption, and low overlap. To address this, this work proposes a cross-modal hybrid attention network that fuses 2D image context with 3D point cloud geometry, and introduces—for the first time—a contrastive learning–driven optimization objective to enhance geometric consistency. By leveraging cross-modal feature alignment and a hybrid attention mechanism, the proposed method substantially improves robustness against noise and partial observations. Extensive experiments demonstrate state-of-the-art registration accuracy and generalization capability across multiple benchmarks, including 3DMatch, 3DLoMatch, and zero-shot evaluation on TUM RGB-D SLAM.

Technology Category

Application Category

📝 Abstract
Robust point cloud registration is a fundamental task in 3D computer vision and geometric deep learning, essential for applications such as large-scale 3D reconstruction, augmented reality, and scene understanding. However, the performance of established learning-based methods often degrades in complex, real world scenarios characterized by incomplete data, sensor noise, and low overlap regions. To address these limitations, we propose CMHANet, a novel Cross-Modal Hybrid Attention Network. Our method integrates the fusion of rich contextual information from 2D images with the geometric detail of 3D point clouds, yielding a comprehensive and resilient feature representation. Furthermore, we introduce an innovative optimization function based on contrastive learning, which enforces geometric consistency and significantly improves the model's robustness to noise and partial observations. We evaluated CMHANet on the 3DMatch and the challenging 3DLoMatch datasets. \rev{Additionally, zero-shot evaluations on the TUM RGB-D SLAM dataset verify the model's generalization capability to unseen domains.} The experimental results demonstrate that our method achieves substantial improvements in both registration accuracy and overall robustness, outperforming current techniques. We also release our code in \href{https://github.com/DongXu-Zhang/CMHANet}{https://github.com/DongXu-Zhang/CMHANet}.
Problem

Research questions and friction points this paper is trying to address.

point cloud registration
robustness
sensor noise
partial overlap
3D reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-Modal Fusion
Hybrid Attention
Point Cloud Registration
Contrastive Learning
Geometric Consistency
Dongxu Zhang
Dongxu Zhang
Optum AI, PhD from UMass Amherst
LLMsnatural language processingrepresentation learningmachine learning
Y
Yingsen Wang
CEPRI, China Electric Power Research Institute No. 15 Xiaoying East Road, Qinghe, Haidian District, Beijing, China
Yiding Sun
Yiding Sun
Renmin University of China
Large Language ModelsExplainable Recommendation
H
Haoran Xu
School of Software, Xi’an Jiaotong University, No. 28 Xianning West Road, China
P
Peilin Fan
School of Software, Xi’an Jiaotong University, No. 28 Xianning West Road, China
J
Jihua Zhu
School of Software, Xi’an Jiaotong University, No. 28 Xianning West Road, China