A Hyperbolic Perspective on Hierarchical Structure in Object-Centric Scene Representations

📅 2026-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing slot attention mechanisms lack geometric inductive biases to capture the hierarchical structure of visual scenes within Euclidean space. This work proposes a post-processing approach that projects pretrained Euclidean slot embeddings into Lorentzian hyperbolic space, explicitly revealing the latent hierarchical organization from scene-level to object-level representations without altering the original training pipeline. To our knowledge, this is the first method to leverage hyperbolic geometry for disentangling implicit hierarchies in slot-based representations, accompanied by a systematic analysis of the trade-off between curvature and task performance. Experiments demonstrate that hyperbolic projection effectively uncovers hierarchical structure: coarse-grained slots reside at greater manifold depths, lower curvature improves parent-slot retrieval, and moderate curvature enhances inter-layer separation. These findings are validated across multiple frameworks, including SPOT, VideoSAUR, and SlotContrast.

Technology Category

Application Category

📝 Abstract
Slot attention has emerged as a powerful framework for unsupervised object-centric learning, decomposing visual scenes into a small set of compact vector representations called \emph{slots}, each capturing a distinct region or object. However, these slots are learned in Euclidean space, which provides no geometric inductive bias for the hierarchical relationships that naturally structure visual scenes. In this work, we propose a simple post-hoc pipeline to project Euclidean slot embeddings onto the Lorentz hyperboloid of hyperbolic space, without modifying the underlying training pipeline. We construct five-level visual hierarchies directly from slot attention masks and analyse whether hyperbolic geometry reveals latent hierarchical structure that remains invisible in Euclidean space. Integrating our pipeline with SPOT (images), VideoSAUR (video), and SlotContrast (video), We find that hyperbolic projection exposes a consistent scene-level to object-level organisation, where coarse slots occupy greater manifold depth than fine slots, which is absent in Euclidean space. We further identify a "curvature--task tradeoff": low curvature ($c{=}0.2$) matches or outperforms Euclidean on parent slot retrieval, while moderate curvature ($c{=}0.5$) achieves better inter-level separation. Together, these findings suggest that slot representations already encode latent hierarchy that hyperbolic geometry reveals, motivating end-to-end hyperbolic training as a natural next step. Code and models are available at \href{https://github.com/NeeluMadan/HHS}{github.com/NeeluMadan/HHS}.
Problem

Research questions and friction points this paper is trying to address.

hierarchical structure
object-centric representation
hyperbolic geometry
slot attention
scene representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

hyperbolic geometry
slot attention
hierarchical representation
object-centric learning
manifold depth
🔎 Similar Papers
No similar papers found.