Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration

📅 2024-05-23
🏛️ Annual Meeting of the Association for Computational Linguistics
📈 Citations: 35
Influential: 1
📄 PDF
🤖 AI Summary
To address the low reasoning efficiency of LLMs in embodied multi-agent collaboration and their excessive reliance on costly physical validation or self-reflection—leading to high query overhead—this paper proposes ReAd, a novel reinforcement learning–inspired framework. ReAd uniquely couples advantage function learning with LLM-based planning, endowing agents with discriminative capability over task completion prospects. It further extends advantage-weighted regression (AWR) theory—previously unexplored in multi-agent LLM settings—to enable collaborative, advantage-guided decision-making. Specifically, ReAd models sequence-level advantage functions via critic regression and employs the LLM as an advantage-maximizing optimizer, realizing theoretically grounded, multi-agent advantage-weighted planning. Evaluated on challenging variants of Overcooked-AI and RoCoBench, ReAd significantly improves task success rates while reducing both interaction steps and LLM query counts, demonstrating enhanced efficiency and robust task grounding.

Technology Category

Application Category

📝 Abstract
Grounding the reasoning ability of large language models (LLMs) for embodied tasks is challenging due to the complexity of the physical world. Especially, LLM planning for multi-agent collaboration requires communication of agents or credit assignment as the feedback to re-adjust the proposed plans and achieve effective coordination. However, existing methods that overly rely on physical verification or self-reflection suffer from excessive and inefficient querying of LLMs. In this paper, we propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans. Specifically, we perform critic regression to learn a sequential advantage function from LLM-planned data, and then treat the LLM planner as an optimizer to generate actions that maximize the advantage function. It endows the LLM with the foresight to discern whether the action contributes to accomplishing the final task. We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems. Experiments on Overcooked-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents and query rounds of LLMs, demonstrating its high efficiency for grounding LLMs. More results are given at https://read-llm.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Efficient grounding of LLMs for embodied multi-agent collaboration
Reducing excessive LLM queries in physical verification and self-reflection
Improving plan refinement and coordination through reinforced advantage feedback
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforced Advantage feedback for self-refinement
Critic regression learns sequential advantage function
LLM planner maximizes advantage function as optimizer
🔎 Similar Papers
No similar papers found.
Y
Yang Zhang
Tsinghua University, Shanghai AI Laboratory
S
Shixin Yang
Northwestern Polytechnical University, Shanghai AI Laboratory
Chenjia Bai
Chenjia Bai
Institute of Artificial Intelligence, China Telecom(中国电信人工智能研究院, TeleAI)
Reinforcement LearningRoboticsEmbodied AI
F
Fei Wu
Shanghai AI Laboratory, Zhejiang University
Xiu Li
Xiu Li
Bytedance Seed
Computer VisionComputer Graphics3D Vision
X
Xuelong Li
Shanghai AI Laboratory, Institute of Artificial Intelligence (TeleAI), China Telecom
Z
Zhen Wang
Shanghai AI Laboratory, Northwestern Polytechnical University