🤖 AI Summary
To address the challenges of natural language instruction understanding and subtask allocation in multi-robot systems operating under distributed field knowledge for multi-object retrieval, this paper proposes a collaborative task decomposition framework integrating large language models (LLMs) with localized spatial concept modeling. The method employs a few-shot prompting strategy enabling the LLM to infer implicit goals from ambiguous instructions (e.g., “prepare for field survey”) and generate semantically coherent subtasks; it further incorporates heterogeneous spatial concepts held by individual robots to achieve context-aware task allocation. Experimental evaluation demonstrates that the framework successfully allocated subtasks in 47 out of 50 multi-object retrieval trials—significantly outperforming random allocation (28/50) and a commonsense-based baseline (26/50). End-to-end validation was conducted on a real-world mobile manipulator platform, confirming practical feasibility and robustness.
📝 Abstract
It is crucial to efficiently execute instructions such as "Find an apple and a banana" or "Get ready for a field trip," which require searching for multiple objects or understanding context-dependent commands. This study addresses the challenging problem of determining which robot should be assigned to which part of a task when each robot possesses different situational on-site knowledge-specifically, spatial concepts learned from the area designated to it by the user. We propose a task planning framework that leverages large language models (LLMs) and spatial concepts to decompose natural language instructions into subtasks and allocate them to multiple robots. We designed a novel few-shot prompting strategy that enables LLMs to infer required objects from ambiguous commands and decompose them into appropriate subtasks. In our experiments, the proposed method achieved 47/50 successful assignments, outperforming random (28/50) and commonsense-based assignment (26/50). Furthermore, we conducted qualitative evaluations using two actual mobile manipulators. The results demonstrated that our framework could handle instructions, including those involving ad hoc categories such as "Get ready for a field trip," by successfully performing task decomposition, assignment, sequential planning, and execution.