🤖 AI Summary
This work addresses the inherent trade-off in robotic imitation learning between modeling long-horizon dependencies and achieving fine-grained closed-loop control. To this end, we propose a hierarchical multi-frequency action chunking framework that jointly predicts action sequences at different temporal frequencies. By aligning and fusing features across these frequencies, the approach simultaneously enables high-level planning and low-level precise control. An entropy-guided adaptive execution mechanism is further introduced to dynamically balance long-term strategic consistency with immediate responsiveness based on action uncertainty. The framework seamlessly integrates with existing 2D and 3D generative policies and demonstrates significant improvements in both performance and efficiency across a variety of simulated and real-world manipulation tasks, confirming its generality and effectiveness.
📝 Abstract
Robotic imitation learning faces a fundamental trade-off between modeling long-horizon dependencies and enabling fine-grained closed-loop control. Existing fixed-frequency action chunking approaches struggle to achieve both. Building on this insight, we propose HiPolicy, a hierarchical multi-frequency action chunking framework that jointly predicts action sequences at different frequencies to capture both coarse high-level plans and precise reactive motions. We extract and fuse hierarchical features from history observations aligned to each frequency for multi-frequency chunk generation, and introduce an entropy-guided execution mechanism that adaptively balances long-horizon planning with fine-grained control based on action uncertainty. Experiments on diverse simulated benchmarks and real-world manipulation tasks show that HiPolicy can be seamlessly integrated into existing 2D and 3D generative policies, delivering consistent improvements in performance while significantly enhancing execution efficiency.