CMHANet: A Cross-Modal Hybrid Attention Network for Point Cloud Registration

📅 2026-03-13

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

Existing point cloud registration methods suffer significant performance degradation under real-world challenges such as data incompleteness, noise corruption, and low overlap. To address this, this work proposes a cross-modal hybrid attention network that fuses 2D image context with 3D point cloud geometry, and introduces—for the first time—a contrastive learning–driven optimization objective to enhance geometric consistency. By leveraging cross-modal feature alignment and a hybrid attention mechanism, the proposed method substantially improves robustness against noise and partial observations. Extensive experiments demonstrate state-of-the-art registration accuracy and generalization capability across multiple benchmarks, including 3DMatch, 3DLoMatch, and zero-shot evaluation on TUM RGB-D SLAM.

Technology Category

Application Category

📝 Abstract

Robust point cloud registration is a fundamental task in 3D computer vision and geometric deep learning, essential for applications such as large-scale 3D reconstruction, augmented reality, and scene understanding. However, the performance of established learning-based methods often degrades in complex, real world scenarios characterized by incomplete data, sensor noise, and low overlap regions. To address these limitations, we propose CMHANet, a novel Cross-Modal Hybrid Attention Network. Our method integrates the fusion of rich contextual information from 2D images with the geometric detail of 3D point clouds, yielding a comprehensive and resilient feature representation. Furthermore, we introduce an innovative optimization function based on contrastive learning, which enforces geometric consistency and significantly improves the model's robustness to noise and partial observations. We evaluated CMHANet on the 3DMatch and the challenging 3DLoMatch datasets. \rev{Additionally, zero-shot evaluations on the TUM RGB-D SLAM dataset verify the model's generalization capability to unseen domains.} The experimental results demonstrate that our method achieves substantial improvements in both registration accuracy and overall robustness, outperforming current techniques. We also release our code in \href{https://github.com/DongXu-Zhang/CMHANet}{https://github.com/DongXu-Zhang/CMHANet}.

Problem

Research questions and friction points this paper is trying to address.

point cloud registration

robustness

sensor noise

partial overlap

3D reconstruction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-Modal Fusion

Hybrid Attention

Point Cloud Registration