🤖 AI Summary
In language-conditioned robotic manipulation tasks, high-level planners often generate infeasible subgoals by overlooking the capabilities of low-level controllers, leading to failures in hierarchical policies. To address this issue, this work proposes the HD-ExpIt framework, which iteratively fine-tunes a hierarchical diffusion policy online through environmental feedback, establishing a self-improving training loop. The approach autonomously discovers successful behaviors via diffusion-based planning and distills them back into the policy, enabling implicit co-optimization of planning and control without requiring explicit surrogate models or static offline datasets. Evaluated on the CALVIN long-horizon benchmark, HD-ExpIt substantially outperforms purely offline training schemes and achieves state-of-the-art performance among methods trained from scratch.
📝 Abstract
Hierarchical policies for language-conditioned manipulation decompose tasks into subgoals, where a high-level planner guides a low-level controller. However, these hierarchical agents often fail because the planner generates subgoals without considering the actual limitations of the controller. Existing solutions attempt to bridge this gap via intermediate modules or shared representations, but they remain limited by their reliance on fixed offline datasets. We propose HD-ExpIt, a framework for iterative fine-tuning of hierarchical diffusion policies via environment feedback. HD-ExpIt organizes training into a self-reinforcing cycle: it utilizes diffusion-based planning to autonomously discover successful behaviors, which are then distilled back into the hierarchical policy. This loop enables both components to improve while implicitly grounding the planner in the controller's actual capabilities without requiring explicit proxy models. Empirically, HD-ExpIt significantly improves hierarchical policies trained solely on offline data, achieving state-of-the-art performance on the long-horizon CALVIN benchmark among methods trained from scratch.