🤖 AI Summary
This paper addresses the compounding error problem in imitation learning for continuous state-action spaces—specifically, that expert demonstrations are open-loop and lack interactive feedback, causing policy errors to accumulate exponentially over time and degrade stability in physical systems (e.g., autonomous driving, robotics). To mitigate this, we propose two non-interactive minimal-intervention strategies: “action chunking”, suited for open-loop stable systems, and “training-time noise injection”, designed for potentially unstable systems—both integrating robust control principles into behavioral cloning. Theoretically, we prove that both methods suppress error growth from exponential to polynomial rate. Under benign assumptions, they significantly improve policy stability and cross-task generalization performance.
📝 Abstract
We study the problem of imitating an expert demonstrator in a continuous state-and-action dynamical system. While imitation learning in discrete settings such as autoregressive language modeling has seen immense success and popularity in recent years, imitation in physical settings such as autonomous driving and robot learning has proven comparably more complex due to the compounding errors problem, often requiring elaborate set-ups to perform stably. Recent work has demonstrated that even in benign settings, exponential compounding errors are unavoidable when learning solely from expert-controlled trajectories, suggesting the need for more advanced policy parameterizations or data augmentation. To this end, we present minimal interventions that provably mitigate compounding errors in continuous state-and-action imitation learning. When the system is open-loop stable, we prescribe "action chunking," i.e., predicting and playing sequences of actions in open-loop; when the system is possibly unstable, we prescribe "noise injection," i.e., adding noise during expert demonstrations. These interventions align with popular choices in modern robot learning, though the benefits we derive are distinct from the effects they were designed to target. Our results draw insights and tools from both control theory and reinforcement learning; however, our analysis reveals novel considerations that do not naturally arise when either literature is considered in isolation.