🤖 AI Summary
This work addresses the limited generalization of policies based on absolute state representations when a distribution shift exists between training and real-world deployment environments—particularly under moving reference frames—often leading to task failure. The study systematically investigates various proprioceptive state encodings and proposes representing states in an episode-level relative reference frame. Combined with end-to-end reinforcement learning, this approach is evaluated on a physical robotic platform for manipulation robustness under both in-distribution and out-of-distribution conditions. Experimental results demonstrate that the proposed method achieves an optimal trade-off between task performance and generalization, significantly outperforming existing baselines and effectively enabling cross-reference-frame data reuse and deployment.
📝 Abstract
As end-to-end robotic policies are progressively deployed in the real world to solve real tasks, they face a gap between the training and inference conditions. Scaling the amount and diversity of the training data has shown some success in improving zero-shot generalization, yet robots still fail when faced with new, unseen test conditions. For instance, while robots with fixed frames of reference are common, those with moving frames pose a greater challenge for deployment. To address this specific instance of the issue, we present a study of strategies for encoding the robot's proprioceptive state to improve both in- and out-of-distribution performance at test time. Through a systematic study of joint representations, we find that a simple episode-wise relative frame provides the best trade-off between task performance and robustness, outperforming the baselines in extensive real-robot experiments conducted in a realistic test environment. The results suggest a practical path to leveraging data collected by robots with varying frames of reference and deployment to unseen test configurations.