🤖 AI Summary
Existing implicit 3D representations (e.g., NeRF) struggle to model the intrinsic hierarchical structure of scenes and objects: multi-scale approaches incur high inference overhead due to repeated rendering, while discrete hierarchical methods suffer from poor generalization. This paper proposes OpenHype, the first NeRF framework to incorporate hyperbolic space embeddings for continuous, open-vocabulary hierarchical modeling. By encoding geometric hierarchy into a hyperbolic latent space, OpenHype captures scale-invariant relational structure; geodesic paths in this space enable smooth, single-pass cross-scale traversal—eliminating the need for predefined hierarchies or iterative rendering. Evaluated on standard 3D understanding benchmarks, OpenHype achieves significant improvements in inference efficiency and structural generalization. It demonstrates superior adaptability and robustness in complex, multi-scale scenes, establishing a new paradigm for hierarchical 3D representation learning.
📝 Abstract
Modeling the inherent hierarchical structure of 3D objects and 3D scenes is highly desirable, as it enables a more holistic understanding of environments for autonomous agents. Accomplishing this with implicit representations, such as Neural Radiance Fields, remains an unexplored challenge. Existing methods that explicitly model hierarchical structures often face significant limitations: they either require multiple rendering passes to capture embeddings at different levels of granularity, significantly increasing inference time, or rely on predefined, closed-set discrete hierarchies that generalize poorly to the diverse and nuanced structures encountered by agents in the real world. To address these challenges, we propose OpenHype, a novel approach that represents scene hierarchies using a continuous hyperbolic latent space. By leveraging the properties of hyperbolic geometry, OpenHype naturally encodes multi-scale relationships and enables smooth traversal of hierarchies through geodesic paths in latent space. Our method outperforms state-of-the-art approaches on standard benchmarks, demonstrating superior efficiency and adaptability in 3D scene understanding.