🤖 AI Summary
This work addresses the challenge of learning diverse robotic skills in the offline setting—where no environmental interaction is permitted, reward functions dynamically shift, and previously acquired skills must be preserved—under imitation constraints. We propose a novel offline diversity-learning framework that introduces a van der Waals–inspired diversity objective, integrated with functional reward encoding (FRE) for conditional modeling and successor feature representations. This enables zero-shot skill retrieval and robust value-function and policy learning under nonstationary rewards, without requiring online interaction or explicit skill discriminators. The method jointly optimizes conditional value functions and policies grounded in interpretable reward-conditioned representations. Evaluated on quadrupedal locomotion and obstacle-aware local navigation tasks in simulation, our approach significantly improves skill diversity, training stability, and the scale of reusable skill libraries compared to prior offline methods.
📝 Abstract
While many algorithms for diversity maximization under imitation constraints are online in nature, many applications require offline algorithms without environment interactions. Tackling this problem in the offline setting, however, presents significant challenges that require non-trivial, multi-stage optimization processes with non-stationary rewards. In this work, we present a novel offline algorithm that enhances diversity using an objective based on Van der Waals (VdW) force and successor features, and eliminates the need to learn a previously used skill discriminator. Moreover, by conditioning the value function and policy on a pre-trained Functional Reward Encoding (FRE), our method allows for better handling of non-stationary rewards and provides zero-shot recall of all skills encountered during training, significantly expanding the set of skills learned in prior work. Consequently, our algorithm benefits from receiving a consistently strong diversity signal (VdW), and enjoys more stable and efficient training. We demonstrate the effectiveness of our method in generating diverse skills for two robotic tasks in simulation: locomotion of a quadruped and local navigation with obstacle traversal.