Efficient Robotic Policy Learning via Latent Space Backward Planning

📅 2025-05-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing long-horizon robotic planning methods rely on pixel-level forward prediction, suffering from high computational overhead and error accumulation that degrades task fidelity. To address these challenges, we propose Latent-space Backward Planning (LBP): starting from a goal latent representation, LBP recursively generates a sequence of semantic subgoals in latent space and conditions the policy network via learnable subgoal tokens, enabling end-to-end joint optimization. LBP introduces the first planning paradigm integrating latent-space representation, backward recursion, and subgoal conditioning—achieving real-time inference while significantly improving long-horizon task alignment. Evaluated in both simulation and real-robot settings, LBP achieves state-of-the-art success rates across multi-stage tasks and demonstrates superior inference speed, validating its effective unification of efficiency and accuracy.

Technology Category

Application Category

📝 Abstract
Current robotic planning methods often rely on predicting multi-frame images with full pixel details. While this fine-grained approach can serve as a generic world model, it introduces two significant challenges for downstream policy learning: substantial computational costs that hinder real-time deployment, and accumulated inaccuracies that can mislead action extraction. Planning with coarse-grained subgoals partially alleviates efficiency issues. However, their forward planning schemes can still result in off-task predictions due to accumulation errors, leading to misalignment with long-term goals. This raises a critical question: Can robotic planning be both efficient and accurate enough for real-time control in long-horizon, multi-stage tasks? To address this, we propose a Latent Space Backward Planning scheme (LBP), which begins by grounding the task into final latent goals, followed by recursively predicting intermediate subgoals closer to the current state. The grounded final goal enables backward subgoal planning to always remain aware of task completion, facilitating on-task prediction along the entire planning horizon. The subgoal-conditioned policy incorporates a learnable token to summarize the subgoal sequences and determines how each subgoal guides action extraction. Through extensive simulation and real-robot long-horizon experiments, we show that LBP outperforms existing fine-grained and forward planning methods, achieving SOTA performance. Project Page: https://lbp-authors.github.io
Problem

Research questions and friction points this paper is trying to address.

Reduce computational costs in robotic policy learning
Minimize accumulated inaccuracies in action extraction
Align long-term goals with efficient real-time control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent Space Backward Planning for efficiency
Recursive intermediate subgoal prediction for accuracy
Learnable token for subgoal sequence summarization
🔎 Similar Papers
No similar papers found.