🤖 AI Summary
To address the low task success rate and erroneous code generation in large language model (LLM)-driven “code-as-policy” approaches—caused by weak environmental modeling in dynamic, partially observable settings—this paper proposes a neuro-symbolic collaborative framework. The framework tightly integrates LLM-based code generation with symbolic logic verification and introduces an interactive exploration-driven state perception and online correction module, significantly enhancing policy adaptability to environmental dynamics and observational uncertainty. Evaluated on the RLBench benchmark and real-world scenarios, our method achieves a 46.2% improvement in task success rate over baselines and attains an executable action rate of over 86.8%. Its core contribution lies in the first deep coupling of symbolic verification with exploration-guided code generation, enabling reliable and formally verifiable policy synthesis for embodied agents operating in open, dynamic environments.
📝 Abstract
Recent advances in large language models (LLMs) have enabled the automatic generation of executable code for task planning and control in embodied agents such as robots, demonstrating the potential of LLM-based embodied intelligence. However, these LLM-based code-as-policies approaches often suffer from limited environmental grounding, particularly in dynamic or partially observable settings, leading to suboptimal task success rates due to incorrect or incomplete code generation. In this work, we propose a neuro-symbolic embodied task planning framework that incorporates explicit symbolic verification and interactive validation processes during code generation. In the validation phase, the framework generates exploratory code that actively interacts with the environment to acquire missing observations while preserving task-relevant states. This integrated process enhances the grounding of generated code, resulting in improved task reliability and success rates in complex environments. We evaluate our framework on RLBench and in real-world settings across dynamic, partially observable scenarios. Experimental results demonstrate that our framework improves task success rates by 46.2% over Code-as-Policies baselines and attains over 86.8% executability of task-relevant actions, thereby enhancing the reliability of task planning in dynamic environments.