🤖 AI Summary
Goal-Conditioned Behavioral Cloning (GCBC) exhibits limited zero-shot compositional generalization—i.e., generalization to unseen state-goal pairs—primarily due to its lack of temporal consistency in learned state representations. To address this, we propose a self-predictive representation learning framework featuring a novel BYOL-γ objective: it requires neither contrastive sampling nor temporal-difference learning, and is theoretically grounded to approximate the Successor Representation, thereby explicitly capturing long-horizon dependencies between states and goals. Integrating this representation seamlessly into the GCBC architecture, our method achieves state-of-the-art performance across multiple compositional generalization benchmarks. It significantly enhances zero-shot transfer across novel state-goal pairs, demonstrating improved robustness and generalizability without additional task-specific supervision or architectural modifications.
📝 Abstract
Behavioral cloning (BC) methods trained with supervised learning (SL) are an effective way to learn policies from human demonstrations in domains like robotics. Goal-conditioning these policies enables a single generalist policy to capture diverse behaviors contained within an offline dataset. While goal-conditioned behavior cloning (GCBC) methods can perform well on in-distribution training tasks, they do not necessarily generalize zero-shot to tasks that require conditioning on novel state-goal pairs, i.e. combinatorial generalization. In part, this limitation can be attributed to a lack of temporal consistency in the state representation learned by BC; if temporally related states are encoded to similar latent representations, then the out-of-distribution gap for novel state-goal pairs would be reduced. Hence, encouraging this temporal consistency in the representation space should facilitate combinatorial generalization. Successor representations, which encode the distribution of future states visited from the current state, nicely encapsulate this property. However, previous methods for learning successor representations have relied on contrastive samples, temporal-difference (TD) learning, or both. In this work, we propose a simple yet effective representation learning objective, $ ext{BYOL-}gamma$ augmented GCBC, which is not only able to theoretically approximate the successor representation in the finite MDP case without contrastive samples or TD learning, but also, results in competitive empirical performance across a suite of challenging tasks requiring combinatorial generalization.