๐ค AI Summary
Large language models (LLMs) generate robot manipulation code with low reliability, while validation via physical experimentation or high-fidelity simulation incurs prohibitive cost and time overhead.
Method: This paper introduces the first LLM-driven static textual simulation framework. Instead of executing code, it simulates execution purely at the textual level via semantic parsing, state-transition reasoning, and dynamic trajectory modeling, augmented by feedback-driven correction for automated error detection and optimization.
Contribution/Results: The core innovation is treating the LLM as a general-purpose โstatic executorโ capable of action parsing, implicit state inference, and outcome attribution. Evaluated across diverse robot manipulation tasks, the framework achieves 92.4% static simulation accuracy, matches state-of-the-art dynamic methods in code generation quality, and entirely eliminates dependence on physical experiments or custom simulation environments.
๐ Abstract
Recent advances in Large language models (LLMs) have demonstrated their promising capabilities of generating robot operation code to enable LLM-driven robots. To enhance the reliability of operation code generated by LLMs, corrective designs with feedback from the observation of executing code have been increasingly adopted in existing research. However, the code execution in these designs relies on either a physical experiment or a customized simulation environment, which limits their deployment due to the high configuration effort of the environment and the potential long execution time. In this paper, we explore the possibility of directly leveraging LLM to enable static simulation of robot operation code, and then leverage it to design a new reliable LLM-driven corrective robot operation code generation framework. Our framework configures the LLM as a static simulator with enhanced capabilities that reliably simulate robot code execution by interpreting actions, reasoning over state transitions, analyzing execution outcomes, and generating se- mantic observations that accurately capture trajectory dynamics. To validate the performance of our framework, we performed experiments on various operation tasks for different robots, including UAVs and small ground vehicles. The experiment results not only demonstrated the high accuracy of our static text-based simulation but also the reliable code generation of our LLM-driven corrective framework, which achieves a comparable performance with state-of-the-art research while does not rely on dynamic code execution using physical experiments or simulators.