Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling

📅 2024-08-30

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Action chunking enhances execution efficiency in robotic imitation learning but often compromises the balance between long-horizon consistency and short-term responsiveness: open-loop execution neglects state changes, while closed-loop adaptation disrupts temporal dependencies. To address this, we propose Bidirectional Decoding (BID), a test-time inference algorithm that jointly optimizes forward planning confidence—via contrastive modeling—and backward decision coherence—enforced through temporal consistency constraints—thereby unifying action chunking with online adaptation. BID operates within a generative policy framework, incorporating multi-candidate sampling, dynamic weighted search, and future-action likelihood evaluation. Evaluated on seven simulation benchmarks and two real-robot tasks, BID consistently improves state-of-the-art models—including Two-Stage Diffusion and RT-2—across key metrics. Our implementation and demonstration videos are publicly available.

Technology Category

Application Category

📝 Abstract

Predicting and executing a sequence of actions without intermediate replanning, known as action chunking, is increasingly used in robot learning from human demonstrations. Yet, its effects on the learned policy remain inconsistent: some studies find it crucial for achieving strong results, while others observe decreased performance. In this paper, we first dissect how action chunking impacts the divergence between a learner and a demonstrator. We find that action chunking allows the learner to better capture the temporal dependencies in demonstrations but at the cost of reduced reactivity to unexpected states. To address this tradeoff, we propose Bidirectional Decoding (BID), a test-time inference algorithm that bridges action chunking with closed-loop adaptation. At each timestep, BID samples multiple candidate predictions and searches for the optimal one based on two criteria: (i) backward coherence, which favors samples that align with previous decisions; (ii) forward contrast, which seeks samples of high likelihood for future plans. By coupling decisions within and across action chunks, BID promotes both long-term consistency and short-term reactivity. Experimental results show that our method boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks. Code and videos are available at https://bid-robot.github.io.

Problem

Research questions and friction points this paper is trying to address.

Action chunking impacts learner-demonstrator divergence tradeoffs

Bidirectional Decoding balances long-term consistency and short-term reactivity

Improves generative policies in simulation and real-world tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bidirectional Decoding for action chunking improvement

Guided test-time sampling with coherence and contrast

Coupling decisions for long-term and short-term optimization

🔎 Similar Papers

No similar papers found.