🤖 AI Summary
Diffusion policies in robotic manipulation are prone to task failure due to error accumulation in action sequences. This work proposes a lightweight classifier framework that, during inference, steers pretrained diffusion policies away from failure modes via gradient guidance. Leveraging an attention-based multiple instance learning approach, the method automatically labels observation-action segments as successful or failed in a self-supervised manner and trains a performance predictor to provide real-time gradient feedback. Requiring neither expert demonstrations nor complex world models, the approach achieves consistent and significant performance improvements across multiple tasks in both Robomimic and MimicGen benchmarks.
📝 Abstract
Diffusion policies have shown to be very efficient at learning complex, multi-modal behaviors for robotic manipulation. However, errors in generated action sequences can compound over time which can potentially lead to failure. Some approaches mitigate this by augmenting datasets with expert demonstrations or learning predictive world models which might be computationally expensive. We introduce Performance Predictive Guidance (PPGuide), a lightweight, classifier-based framework that steers a pre-trained diffusion policy away from failure modes at inference time. PPGuide makes use of a novel self-supervised process: it uses attention-based multiple instance learning to automatically estimate which observation-action chunks from the policy's rollouts are relevant to success or failure. We then train a performance predictor on this self-labeled data. During inference, this predictor provides a real-time gradient to guide the policy toward more robust actions. We validated our proposed PPGuide across a diverse set of tasks from the Robomimic and MimicGen benchmarks, demonstrating consistent improvements in performance.