🤖 AI Summary
To address persistent logical errors and hallucinations in long-horizon embodied task planning—exacerbated by scarce high-quality demonstration data—this paper proposes ReLEP, a demonstration-free, real-time long-horizon planning framework. Methodologically, ReLEP introduces (i) a novel implicit logical reasoning modeling mechanism jointly optimized with hallucination suppression; (ii) a skill-functionalized planning paradigm, wherein a fine-tuned multimodal large language model learns the mapping from abstract instructions to executable action sequences; and (iii) a logic-aware synthetic data generation pipeline, augmented with a recallable memory module and a cross-platform robot configuration module for heterogeneous hardware adaptation. Evaluated across diverse long-horizon tasks, ReLEP significantly outperforms state-of-the-art methods, achieving high success rates and execution compliance on both seen and zero-shot tasks, while effectively mitigating logical inconsistencies and factual hallucinations.
📝 Abstract
Long-horizon embodied planning underpins embodied AI. To accomplish long-horizon tasks, one of the most feasible ways is to decompose abstract instructions into a sequence of actionable steps. Foundation models still face logical errors and hallucinations in long-horizon planning, unless provided with highly relevant examples to the tasks. However, providing highly relevant examples for any random task is unpractical. Therefore, we present ReLEP, a novel framework for Real-time Long-horizon Embodied Planning. ReLEP can complete a wide range of long-horizon tasks without in-context examples by learning implicit logical inference through fine-tuning. The fine-tuned large vision-language model formulates plans as sequences of skill functions. These functions are selected from a carefully designed skill library. ReLEP is also equipped with a Memory module for plan and status recall, and a Robot Configuration module for versatility across robot types. In addition, we propose a data generation pipeline to tackle dataset scarcity. When constructing the dataset, we considered the implicit logical relationships, enabling the model to learn implicit logical relationships and dispel hallucinations. Through comprehensive evaluations across various long-horizon tasks, ReLEP demonstrates high success rates and compliance to execution even on unseen tasks and outperforms state-of-the-art baseline methods.