HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses multi-view 3D place recognition for cross-platform (ground/airborne) LiDAR point clouds in large-scale urban and forest scenes. We propose a hierarchical octree-based Transformer architecture. Key contributions include: (i) a cylindrical-adaptive octree attention window that respects geometric structure and varying point densities; (ii) a relay-token-driven global–local interaction mechanism enabling efficient long-range contextual modeling; and (iii) pyramid attention pooling to generate robust global descriptors. The method is fully end-to-end trainable and explicitly handles challenges posed by large spatial extents, heterogeneous environments, and non-uniform point densities. Evaluated on the newly introduced CS-Wild-Places cross-source野外 dataset, our approach achieves 5.5–11.5% absolute improvement in Top-1 mean recall. On established urban and forest benchmarks, it surpasses state-of-the-art methods by 5.8% in average performance.

Technology Category

Application Category

📝 Abstract
We present HOTFormerLoc, a novel and versatile Hierarchical Octree-based Transformer, for large-scale 3D place recognition in both ground-to-ground and ground-to-aerial scenarios across urban and forest environments. We propose an octree-based multi-scale attention mechanism that captures spatial and semantic features across granularities. To address the variable density of point distributions from spinning lidar, we present cylindrical octree attention windows to reflect the underlying distribution during attention. We introduce relay tokens to enable efficient global-local interactions and multi-scale representation learning at reduced computational cost. Our pyramid attentional pooling then synthesises a robust global descriptor for end-to-end place recognition in challenging environments. In addition, we introduce CS-Wild-Places, a novel 3D cross-source dataset featuring point cloud data from aerial and ground lidar scans captured in dense forests. Point clouds in CS-Wild-Places contain representational gaps and distinctive attributes such as varying point densities and noise patterns, making it a challenging benchmark for cross-view localisation in the wild. HOTFormerLoc achieves a top-1 average recall improvement of 5.5% - 11.5% on the CS-Wild-Places benchmark. Furthermore, it consistently outperforms SOTA 3D place recognition methods, with an average performance gain of 5.8% on well-established urban and forest datasets. The code and CS-Wild-Places benchmark is available at https://csiro-robotics.github.io/HOTFormerLoc .
Problem

Research questions and friction points this paper is trying to address.

Develops a versatile 3D place recognition method for ground and aerial views.
Addresses variable point density in lidar data with cylindrical octree attention.
Introduces a new dataset for cross-view localization in challenging environments.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Octree Transformer for 3D recognition
Cylindrical octree attention for variable density
Relay tokens for efficient global-local interactions
🔎 Similar Papers
No similar papers found.
E
Ethan Griffiths
CSIRO Robotics, Data61, CSIRO
M
Maryam Haghighat
Queensland University of Technology (QUT)
Simon Denman
Simon Denman
Queensland University of Technology
Computer VisionBiometricsIntelligent Surveillance
C
C. Fookes
Queensland University of Technology (QUT)
Milad Ramezani
Milad Ramezani
Team Leader | Senior Research Scientist, CSIRO Data61
SLAMRoboticsMachine Learning