🤖 AI Summary
Existing dense description methods struggle to simultaneously preserve fine-grained local geometry and global semantic hierarchy in sparse point clouds, often resulting in inaccurate localization or fragmented descriptions. This work proposes a curvature-aware descriptive framework that introduces, for the first time, a complementary curvature mechanism between Oblique manifolds and Lorentzian hyperboloids, effectively mitigating the modeling paradigm conflict between Euclidean and hyperbolic spaces. Long-range dependencies are captured via self-attention in Oblique space, while hierarchical semantic relationships among scene instances are modeled through bidirectional geodesic cross-attention in Lorentz space. Feature stability is further ensured by non-Euclidean embedding and isotropic optimization. The proposed method achieves state-of-the-art performance on both ScanRefer and Nr3D benchmarks, significantly improving target localization accuracy and enriching scene-level descriptions.
📝 Abstract
Accurate 3D scene description is fundamental to robotic navigation and augmented reality, yet current dense captioning methods face significant limitations in processing sparse point cloud data. %
Existing approaches that apply Euclidean embedding spaces struggle to simultaneously preserve fine-grained local geometric details and model exponentially growing global semantic hierarchies, leading to either inaccurate localization or disjointed, shallow scene descriptions. %
In this work, we propose a novel \textbf{\textsc{Curvature-Aware Captioning}} framework, integrating novel non-Euclidean geodesic attention mechanisms, to resolve the localization-contextualization conflict. %
Specifically, self-attention within Oblique space enforces dimensional homogeneity while establishing long-range dependencies. Bidirectional geodesic cross-attention within Lorentz space models hierarchical semantic relationships across scene instances, enabling simultaneous precision in object localization and coherence in scene descriptions. %
Theoretical analysis confirms that the curvature complementarity between the Oblique manifold and Lorentz hyperboloid resolves the Euclidean-hyperbolic conflict, ensuring feature stability via isotropic optimization while preserving inherent hierarchical relationships. Extensive experiments on ScanRefer and Nr3D benchmarks demonstrate state-of-the-art performance, with significant gains in both localization accuracy and descriptive richness.