FS-I2P:A Hierarchical Focus-Sweep Registration Network with Dynamically Allocated Depth

📅 2026-05-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

241K/year
🤖 AI Summary
This work addresses the challenges of scale ambiguity and erroneous correspondences in image-to-point-cloud registration caused by viewpoint variations, cross-modal discrepancies, and repetitive textures. To this end, the authors propose a “focus-and-scan” registration paradigm that leverages a state-space model to construct a hierarchical interaction module. Inspired by human visual perception, the method introduces a dynamic layer allocation strategy that adaptively determines the optimal iteration depth, thereby strengthening multi-scale cross-modal feature associations while mitigating attention drift and scale inconsistency. Evaluated on the RGB-D Scenes V2 and 7-Scenes benchmarks, the proposed approach achieves state-of-the-art performance, demonstrating significant improvements in both registration accuracy and robustness.
📝 Abstract
Image-to-point cloud registration is often challenged by viewpoint changes, cross-modal discrepancies, and repetitive textures, which induce scale ambiguity and consequently lead to erroneous correspondences. Recent detection-free methods alleviate this issue by leveraging multi-scale features and transformer-based interactions. However, they still suffer from attention drift across layers and intra-scale inconsistencies, hindering precise registration. Inspired by human behavior, we propose a ``Focus--Sweep'' paradigm and develop a Hierarchical Focus--Sweep Interaction Module within an SSM-based framework to enhance multi-level cross-modal feature association. In addition, we introduce a Dynamic Layer Allocation Strategy that adaptively determines the iteration depth to better exploit geometric constraints and improve matching robustness. Extensive experiments and ablations on two benchmarks, RGB-D Scenes V2 and 7-Scenes, demonstrate that our approach achieves state-of-the-art performance.
Problem

Research questions and friction points this paper is trying to address.

image-to-point cloud registration
scale ambiguity
cross-modal discrepancy
attention drift
intra-scale inconsistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Focus-Sweep
Hierarchical Interaction
Dynamic Layer Allocation
Image-to-Point Cloud Registration
State Space Model
Z
Zhixin Cheng
School of Computer Science and Information Engineering, Hefei University of Technology
Yujia Chen
Yujia Chen
University of Science and Technology of China
Computer Vision
X
Xujing Tao
School of Information Science and Technology, University of Science and Technology of China
B
Bohao Liao
School of Information Science and Technology, University of Science and Technology of China
Xiaotian Yin
Xiaotian Yin
Research Associate, Harvard University
Discrete algorithms of geometry and topology and their applications
B
Baoqun Yin
School of Information Science and Technology, University of Science and Technology of China
Tianzhu Zhang
Tianzhu Zhang
Professor, University of Science and Technology of China; previously Institute of Automation, CAS
Computer VisionPattern RecognitionMultimedia AnalysisMachine Learning