Cross-Modal Geometric Hierarchy Fusion: An Implicit-Submap Driven Framework for Resilient 3D Place Recognition

📅 2025-06-17

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

LiDAR place recognition faces two key challenges: (1) inconsistent point cloud density across repeated traversals—caused by ego-motion and environmental disturbances—leading to unstable descriptor learning; and (2) fragile geometric representations arising from reliance on single-layer abstractions, resulting in insufficient discriminability in structurally complex scenes. To address these, we propose a density-invariant, implicit subgraph-driven framework. We introduce elastic point-implicit 3D representation to explicitly decouple density-induced interference. Our hierarchical geometric descriptor jointly encodes bird’s-eye-view (BEV) macro-layout and 3D surface micro-geometry across complementary views (BEV + 3D segment). Leveraging implicit geometric modeling, joint encoding of surface normals and occupancy grids, and a multi-scale feature fusion network, our method achieves state-of-the-art performance on KITTI, KITTI-360, MulRan, and NCLT—demonstrating superior accuracy, real-time inference speed, and historical map memory efficiency.

Technology Category

Application Category

📝 Abstract

LiDAR-based place recognition serves as a crucial enabler for long-term autonomy in robotics and autonomous driving systems. Yet, prevailing methodologies relying on handcrafted feature extraction face dual challenges: (1) Inconsistent point cloud density, induced by ego-motion dynamics and environmental disturbances during repeated traversals, leads to descriptor instability, and (2) Representation fragility stems from reliance on single-level geometric abstractions that lack discriminative power in structurally complex scenarios. To address these limitations, we propose a novel framework that redefines 3D place recognition through density-agnostic geometric reasoning. Specifically, we introduce an implicit 3D representation based on elastic points, which is immune to the interference of original scene point cloud density and achieves the characteristic of uniform distribution. Subsequently, we derive the occupancy grid and normal vector information of the scene from this implicit representation. Finally, with the aid of these two types of information, we obtain descriptors that fuse geometric information from both bird's-eye view (capturing macro-level spatial layouts) and 3D segment (encoding micro-scale surface geometries) perspectives. We conducted extensive experiments on numerous datasets (KITTI, KITTI-360, MulRan, NCLT) across diverse environments. The experimental results demonstrate that our method achieves state-of-the-art performance. Moreover, our approach strikes an optimal balance between accuracy, runtime, and memory optimization for historical maps, showcasing excellent Resilient and scalability. Our code will be open-sourced in the future.

Problem

Research questions and friction points this paper is trying to address.

Addresses inconsistent LiDAR point cloud density issues

Overcomes fragility of single-level geometric abstractions

Enhances 3D place recognition in complex environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Implicit 3D representation with elastic points

Density-agnostic geometric reasoning framework

Multi-level geometric fusion for descriptors

🔎 Similar Papers

VXP: Voxel-Cross-Pixel Large-scale Image-LiDAR Place Recognition