Fast ECoT: Efficient Embodied Chain-of-Thought via Thoughts Reuse

πŸ“… 2025-06-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
ECoT reasoning enhances both performance and interpretability of Vision-Language-Action (VLA) models, yet its autoregressive token generation incurs high inference latency, hindering real-time robotic deployment. To address this, we propose a lightweight, model-agnostic optimization framework requiring no architectural modification or retraining. Our method introduces (1) a novel *thought caching and reuse mechanism* that models structural redundancy across reasoning traces to dynamically reuse historical chains-of-thought, and (2) a *modular parallel token generation scheme* coupled with an *asynchronous scheduler*, decoupling reasoning from action decoding. Fully compatible with existing VLA systems, our approach achieves up to 7.5% reduction in end-to-end inference latency on both LIBERO simulation benchmarks and real-robot tasksβ€”while preserving or improving task success rates and reasoning fidelity.

Technology Category

Application Category

πŸ“ Abstract
Embodied Chain-of-Thought (ECoT) reasoning enhances vision-language-action (VLA) models by improving performance and interpretability through intermediate reasoning steps. However, its sequential autoregressive token generation introduces significant inference latency, limiting real-time deployment. We propose Fast ECoT, an inference-time acceleration method that exploits the structured and repetitive nature of ECoT to (1) cache and reuse high-level reasoning across timesteps and (2) parallelise the generation of modular reasoning steps. Additionally, we introduce an asynchronous scheduler that decouples reasoning from action decoding, further boosting responsiveness. Fast ECoT requires no model changes or additional training and integrates easily into existing VLA pipelines. Experiments in both simulation (LIBERO) and real-world robot tasks show up to a 7.5% reduction in latency with comparable or improved task success rate and reasoning faithfulness, bringing ECoT policies closer to practical real-time deployment.
Problem

Research questions and friction points this paper is trying to address.

Reduces inference latency in ECoT reasoning for real-time deployment
Enables reuse and parallelization of modular reasoning steps
Decouples reasoning from action decoding to enhance responsiveness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cache and reuse high-level reasoning steps
Parallelise modular reasoning steps generation
Decouple reasoning from action decoding asynchronously
πŸ”Ž Similar Papers
No similar papers found.