Iterative On-Policy Refinement of Hierarchical Diffusion Policies for Language-Conditioned Manipulation

📅 2026-03-05

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

In language-conditioned robotic manipulation tasks, high-level planners often generate infeasible subgoals by overlooking the capabilities of low-level controllers, leading to failures in hierarchical policies. To address this issue, this work proposes the HD-ExpIt framework, which iteratively fine-tunes a hierarchical diffusion policy online through environmental feedback, establishing a self-improving training loop. The approach autonomously discovers successful behaviors via diffusion-based planning and distills them back into the policy, enabling implicit co-optimization of planning and control without requiring explicit surrogate models or static offline datasets. Evaluated on the CALVIN long-horizon benchmark, HD-ExpIt substantially outperforms purely offline training schemes and achieves state-of-the-art performance among methods trained from scratch.

Technology Category

Application Category

📝 Abstract

Hierarchical policies for language-conditioned manipulation decompose tasks into subgoals, where a high-level planner guides a low-level controller. However, these hierarchical agents often fail because the planner generates subgoals without considering the actual limitations of the controller. Existing solutions attempt to bridge this gap via intermediate modules or shared representations, but they remain limited by their reliance on fixed offline datasets. We propose HD-ExpIt, a framework for iterative fine-tuning of hierarchical diffusion policies via environment feedback. HD-ExpIt organizes training into a self-reinforcing cycle: it utilizes diffusion-based planning to autonomously discover successful behaviors, which are then distilled back into the hierarchical policy. This loop enables both components to improve while implicitly grounding the planner in the controller's actual capabilities without requiring explicit proxy models. Empirically, HD-ExpIt significantly improves hierarchical policies trained solely on offline data, achieving state-of-the-art performance on the long-horizon CALVIN benchmark among methods trained from scratch.

Problem

Research questions and friction points this paper is trying to address.

hierarchical policies

language-conditioned manipulation

subgoal planning

controller limitations

planner-controller mismatch

Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical diffusion policies

iterative on-policy refinement

language-conditioned manipulation

diffusion-based planning

self-reinforcing training loop

🔎 Similar Papers

Language-Guided Manipulation with Diffusion Policies and Constrained Inpainting

2024-06-14arXiv.orgCitations: 9

Language-Guided Object-Centric Diffusion Policy for Generalizable and Collision-Aware Robotic Manipulation

2024-06-29Citations: 0