🤖 AI Summary
Existing long-horizon task planning for partially observable multi-robot environments suffers from insufficient robustness, while current large language models (LLMs) lack support for online adaptation and self-correction. Method: We propose LLaMAR, a cognitive architecture built upon a plan-act-correct-verify closed-loop framework that enables simulator-free, oracle-free online dynamic planning and execution correction. Contribution/Results: We introduce MAP-THOR—the first benchmark for multi-agent long-horizon tasks—integrating LLM-based reasoning, multi-agent collaborative planning, and execution-feedback-driven dynamic correction within the AI2-THOR environment. Evaluated on MAP-THOR and real-world search-and-rescue scenarios, our approach achieves a 30% higher task success rate than state-of-the-art LLM-based methods, significantly improving robustness under partial observability and cross-task generalization capability.
📝 Abstract
The ability of Language Models (LMs) to understand natural language makes them a powerful tool for parsing human instructions into task plans for autonomous robots. Unlike traditional planning methods that rely on domain-specific knowledge and handcrafted rules, LMs generalize from diverse data and adapt to various tasks with minimal tuning, acting as a compressed knowledge base. However, LMs in their standard form face challenges with long-horizon tasks, particularly in partially observable multi-agent settings. We propose an LM-based Long-Horizon Planner for Multi-Agent Robotics (LLaMAR), a cognitive architecture for planning that achieves state-of-the-art results in long-horizon tasks within partially observable environments. LLaMAR employs a plan-act-correct-verify framework, allowing self-correction from action execution feedback without relying on oracles or simulators. Additionally, we present MAP-THOR, a comprehensive test suite encompassing household tasks of varying complexity within the AI2-THOR environment. Experiments show that LLaMAR achieves a 30% higher success rate than other state-of-the-art LM-based multi-agent planners in MAP-THOR and Search &Rescue tasks. Code can be found at https://github.com/nsidn98/LLaMAR