🤖 AI Summary
This paper investigates the fundamental source of performance gains exhibited by generative policies in behavior cloning (BC), challenging the prevailing assumption that modeling multimodal action distributions is essential. Method: Through systematic ablation and comparative experiments, we identify supervised iterative computation—not expressive distribution modeling—as the primary driver of improvement. Accordingly, we propose the Minimal Iterative Policy (MIP): a lightweight framework featuring only two regression steps with intermediate supervision. Contribution/Results: On standard BC benchmarks, MIP matches or surpasses state-of-the-art generative methods (e.g., diffusion models) while significantly outperforming distillation-based shortcut models. Our findings demonstrate that the synergy between iterative optimization and intermediate supervision—introducing structured stochasticity—is central to generative policy efficacy, not distributional expressivity. This insight establishes a new paradigm for efficient, interpretable robot policy learning.
📝 Abstract
Generative models, like flows and diffusions, have recently emerged as popular and efficacious policy parameterizations in robotics. There has been much speculation as to the factors underlying their successes, ranging from capturing multi-modal action distribution to expressing more complex behaviors. In this work, we perform a comprehensive evaluation of popular generative control policies (GCPs) on common behavior cloning (BC) benchmarks. We find that GCPs do not owe their success to their ability to capture multi-modality or to express more complex observation-to-action mappings. Instead, we find that their advantage stems from iterative computation, as long as intermediate steps are supervised during training and this supervision is paired with a suitable level of stochasticity. As a validation of our findings, we show that a minimum iterative policy (MIP), a lightweight two-step regression-based policy, essentially matches the performance of flow GCPs, and often outperforms distilled shortcut models. Our results suggest that the distribution-fitting component of GCPs is less salient than commonly believed, and point toward new design spaces focusing solely on control performance. Project page: https://simchowitzlabpublic.github.io/much-ado-about-noising-project/