🤖 AI Summary
Behavior cloning (BC) suffers from severe performance degradation under observation noise or adversarial perturbations—particularly problematic in safety-critical applications such as autonomous driving. To address this, we propose a robust BC framework grounded in global Lipschitz regularization: we impose theoretically guaranteed Lipschitz constraints on the policy network to bound its sensitivity to input perturbations, and design neural architectures that provably satisfy global Lipschitz continuity—enabling robust policy learning within standard supervised learning. Crucially, our method requires no access to environment dynamics or reinforcement learning signals, yet provides certified robustness against bounded observation perturbations. Extensive experiments across multiple Gymnasium benchmarks demonstrate substantial improvements in policy stability and safety under both stochastic noise and adversarial attacks. Our approach thus offers a practical, theoretically principled pathway for robust imitation learning in safety-critical domains.
📝 Abstract
Behavior Cloning (BC) is an effective imitation learning technique and has even been adopted in some safety-critical domains such as autonomous vehicles. BC trains a policy to mimic the behavior of an expert by using a dataset composed of only state-action pairs demonstrated by the expert, without any additional interaction with the environment. However, During deployment, the policy observations may contain measurement errors or adversarial disturbances. Since the observations may deviate from the true states, they can mislead the agent into making sub-optimal actions. In this work, we use a global Lipschitz regularization approach to enhance the robustness of the learned policy network. We then show that the resulting global Lipschitz property provides a robustness certificate to the policy with respect to different bounded norm perturbations. Then, we propose a way to construct a Lipschitz neural network that ensures the policy robustness. We empirically validate our theory across various environments in Gymnasium. Keywords: Robust Reinforcement Learning; Behavior Cloning; Lipschitz Neural Network