Adaptive Agent Selection and Interaction Network for Image-to-point cloud Registration

๐Ÿ“… 2025-11-08
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing detection-free image-to-point-cloud registration methods suffer from erroneous correspondences under noise and lack mechanisms for selecting high-informative cross-modal features, limiting robustness and accuracy. This paper proposes IterProxy, a novel registration framework based on iterative proxy selection and reliable proxy interaction. Its key contributions are: (1) incorporation of phase-map encoding to enhance structural awareness; (2) a reinforcement learningโ€“driven dynamic cross-modal proxy selection mechanism; and (3) a guided cross-modal attention module to suppress spurious matches. By integrating Transformer architecture with phase-map encoding, IterProxy achieves state-of-the-art performance on the RGB-D Scenes v2 and 7-Scenes benchmarks, demonstrating significant improvements in registration robustness and accuracy under complex, noisy scenarios.

Technology Category

Application Category

๐Ÿ“ Abstract
Typical detection-free methods for image-to-point cloud registration leverage transformer-based architectures to aggregate cross-modal features and establish correspondences. However, they often struggle under challenging conditions, where noise disrupts similarity computation and leads to incorrect correspondences. Moreover, without dedicated designs, it remains difficult to effectively select informative and correlated representations across modalities, thereby limiting the robustness and accuracy of registration. To address these challenges, we propose a novel cross-modal registration framework composed of two key modules: the Iterative Agents Selection (IAS) module and the Reliable Agents Interaction (RAI) module. IAS enhances structural feature awareness with phase maps and employs reinforcement learning principles to efficiently select reliable agents. RAI then leverages these selected agents to guide cross-modal interactions, effectively reducing mismatches and improving overall robustness. Extensive experiments on the RGB-D Scenes v2 and 7-Scenes benchmarks demonstrate that our method consistently achieves state-of-the-art performance.
Problem

Research questions and friction points this paper is trying to address.

Addresses noise disrupting similarity computation in cross-modal registration
Solves ineffective selection of informative cross-modal representations
Improves robustness against incorrect correspondences in image-point cloud alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative Agents Selection module enhances structural awareness
Reliable Agents Interaction module guides cross-modal interactions
Framework uses reinforcement learning to select reliable agents
๐Ÿ”Ž Similar Papers
No similar papers found.
Z
Zhixin Cheng
School of Information Science and Technology, University of Science and Technology of China
Xiaotian Yin
Xiaotian Yin
Research Associate, Harvard University
Discrete algorithms of geometry and topology and their applications
Jiacheng Deng
Jiacheng Deng
University of Science and Technology of China
Point cloud3D scene perception
B
Bohao Liao
School of Information Science and Technology, University of Science and Technology of China
Y
Yujia Chen
Institute of Advanced Technology, University of Science and Technology of China
X
Xu Zhou
Sangfor Technologies
B
Baoqun Yin
School of Information Science and Technology, University of Science and Technology of China
Tianzhu Zhang
Tianzhu Zhang
Professor, University of Science and Technology of China; previously Institute of Automation, CAS
Computer VisionPattern RecognitionMultimedia AnalysisMachine Learning