🤖 AI Summary
To address few-shot, out-of-distribution (OOD), and Sim2Real generalization challenges in object state and relational classification for long-horizon robotic tasks, this paper proposes a novel few-shot classification framework integrating predicate hierarchy with hyperbolic geometry. Methodologically, we design an object-centric scene encoder, introduce a self-supervised predicate relation reasoning loss, and construct a hierarchical hyperbolic embedding space—leveraging hyperbolic distance to explicitly model semantic hierarchies among predicates. Our key contribution is the first incorporation of predicate-level hierarchical structure into hyperbolic representation learning, enabling strong zero-shot and few-shot generalization. Evaluated on the CALVIN and BEHAVIOR benchmarks, our approach achieves state-of-the-art performance in state classification across few-shot, OOD, and Sim2Real transfer settings.
📝 Abstract
State classification of objects and their relations is core to many long-horizon tasks, particularly in robot planning and manipulation. However, the combinatorial explosion of possible object-predicate combinations, coupled with the need to adapt to novel real-world environments, makes it a desideratum for state classification models to generalize to novel queries with few examples. To this end, we propose PHIER, which leverages predicate hierarchies to generalize effectively in few-shot scenarios. PHIER uses an object-centric scene encoder, self-supervised losses that infer semantic relations between predicates, and a hyperbolic distance metric that captures hierarchical structure; it learns a structured latent space of image-predicate pairs that guides reasoning over state classification queries. We evaluate PHIER in the CALVIN and BEHAVIOR robotic environments and show that PHIER significantly outperforms existing methods in few-shot, out-of-distribution state classification, and demonstrates strong zero- and few-shot generalization from simulated to real-world tasks. Our results demonstrate that leveraging predicate hierarchies improves performance on state classification tasks with limited data.