🤖 AI Summary
Behavior cloning of visuomotor policies is highly susceptible to covariate shift, where minor state deviations lead to catastrophic failure. Existing mitigation strategies—such as human-in-the-loop correction or task-specific data augmentation—are either prohibitively costly or rely on strong domain assumptions, often compromising imitation fidelity. To address this, we propose an implicit policy barrier mechanism that constructs a boundary around the expert data distribution in latent space, decoupling behavior imitation from out-of-distribution recovery. Specifically, we employ a diffusion-based policy model to capture expert behavior and jointly learn a dynamics model from both expert and suboptimal trajectories, enabling safe latent-space prediction and optimization. Our approach requires no manual intervention or external data augmentation, ensuring both in-distribution inference reliability and action safety. Evaluated on both simulation and real-robot tasks, it significantly improves robustness and data efficiency, achieving stable manipulation with only minimal expert demonstrations.
📝 Abstract
Visuomotor policies trained via behavior cloning are vulnerable to covariate shift, where small deviations from expert trajectories can compound into failure. Common strategies to mitigate this issue involve expanding the training distribution through human-in-the-loop corrections or synthetic data augmentation. However, these approaches are often labor-intensive, rely on strong task assumptions, or compromise the quality of imitation. We introduce Latent Policy Barrier, a framework for robust visuomotor policy learning. Inspired by Control Barrier Functions, LPB treats the latent embeddings of expert demonstrations as an implicit barrier separating safe, in-distribution states from unsafe, out-of-distribution (OOD) ones. Our approach decouples the role of precise expert imitation and OOD recovery into two separate modules: a base diffusion policy solely on expert data, and a dynamics model trained on both expert and suboptimal policy rollout data. At inference time, the dynamics model predicts future latent states and optimizes them to stay within the expert distribution. Both simulated and real-world experiments show that LPB improves both policy robustness and data efficiency, enabling reliable manipulation from limited expert data and without additional human correction or annotation.