Adaptive Agent Selection and Interaction Network for Image-to-point cloud Registration

📅 2025-11-08

📈 Citations: 0

✨ Influential: 0

career value

248K/year

🤖 AI Summary

Existing detection-free image-to-point-cloud registration methods suffer from erroneous correspondences under noise and lack mechanisms for selecting high-informative cross-modal features, limiting robustness and accuracy. This paper proposes IterProxy, a novel registration framework based on iterative proxy selection and reliable proxy interaction. Its key contributions are: (1) incorporation of phase-map encoding to enhance structural awareness; (2) a reinforcement learning–driven dynamic cross-modal proxy selection mechanism; and (3) a guided cross-modal attention module to suppress spurious matches. By integrating Transformer architecture with phase-map encoding, IterProxy achieves state-of-the-art performance on the RGB-D Scenes v2 and 7-Scenes benchmarks, demonstrating significant improvements in registration robustness and accuracy under complex, noisy scenarios.

Technology Category

Application Category

📝 Abstract

Typical detection-free methods for image-to-point cloud registration leverage transformer-based architectures to aggregate cross-modal features and establish correspondences. However, they often struggle under challenging conditions, where noise disrupts similarity computation and leads to incorrect correspondences. Moreover, without dedicated designs, it remains difficult to effectively select informative and correlated representations across modalities, thereby limiting the robustness and accuracy of registration. To address these challenges, we propose a novel cross-modal registration framework composed of two key modules: the Iterative Agents Selection (IAS) module and the Reliable Agents Interaction (RAI) module. IAS enhances structural feature awareness with phase maps and employs reinforcement learning principles to efficiently select reliable agents. RAI then leverages these selected agents to guide cross-modal interactions, effectively reducing mismatches and improving overall robustness. Extensive experiments on the RGB-D Scenes v2 and 7-Scenes benchmarks demonstrate that our method consistently achieves state-of-the-art performance.

Problem

Research questions and friction points this paper is trying to address.

Addresses noise disrupting similarity computation in cross-modal registration

Solves ineffective selection of informative cross-modal representations

Improves robustness against incorrect correspondences in image-point cloud alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative Agents Selection module enhances structural awareness

Reliable Agents Interaction module guides cross-modal interactions

Framework uses reinforcement learning to select reliable agents

🔎 Similar Papers

No similar papers found.