🤖 AI Summary
To address incomplete environmental modeling, difficulty in maintaining dynamic memory, and weak generalization capability in long-term service robot deployment, this paper proposes the L3M+P framework: a synergistic integration of large language models (LLMs) and classical planning, grounded in a sustainably updated knowledge graph as a unified world-state representation. We design a rule-constrained multimodal graph update mechanism that enables natural-language instruction understanding, cross-task memory inheritance, and state-consistency maintenance. Furthermore, the framework achieves end-to-end generation of formal planning problems from linguistic and perceptual inputs. Evaluations on both simulation and real-robot platforms demonstrate significant improvements over existing baselines in natural-language-driven state modeling accuracy and planning success rate. To our knowledge, L3M+P is the first LLM-planning co-architecture explicitly designed for lifelong task execution.
📝 Abstract
By combining classical planning methods with large language models (LLMs), recent research such as LLM+P has enabled agents to plan for general tasks given in natural language. However, scaling these methods to general-purpose service robots remains challenging: (1) classical planning algorithms generally require a detailed and consistent specification of the environment, which is not always readily available; and (2) existing frameworks mainly focus on isolated planning tasks, whereas robots are often meant to serve in long-term continuous deployments, and therefore must maintain a dynamic memory of the environment which can be updated with multi-modal inputs and extracted as planning knowledge for future tasks. To address these two issues, this paper introduces L3M+P (Lifelong LLM+P), a framework that uses an external knowledge graph as a representation of the world state. The graph can be updated from multiple sources of information, including sensory input and natural language interactions with humans. L3M+P enforces rules for the expected format of the absolute world state graph to maintain consistency between graph updates. At planning time, given a natural language description of a task, L3M+P retrieves context from the knowledge graph and generates a problem definition for classical planners. Evaluated on household robot simulators and on a real-world service robot, L3M+P achieves significant improvement over baseline methods both on accurately registering natural language state changes and on correctly generating plans, thanks to the knowledge graph retrieval and verification.