🤖 AI Summary
To address the low reasoning efficiency of LLMs in embodied multi-agent collaboration and their excessive reliance on costly physical validation or self-reflection—leading to high query overhead—this paper proposes ReAd, a novel reinforcement learning–inspired framework. ReAd uniquely couples advantage function learning with LLM-based planning, endowing agents with discriminative capability over task completion prospects. It further extends advantage-weighted regression (AWR) theory—previously unexplored in multi-agent LLM settings—to enable collaborative, advantage-guided decision-making. Specifically, ReAd models sequence-level advantage functions via critic regression and employs the LLM as an advantage-maximizing optimizer, realizing theoretically grounded, multi-agent advantage-weighted planning. Evaluated on challenging variants of Overcooked-AI and RoCoBench, ReAd significantly improves task success rates while reducing both interaction steps and LLM query counts, demonstrating enhanced efficiency and robust task grounding.
📝 Abstract
Grounding the reasoning ability of large language models (LLMs) for embodied tasks is challenging due to the complexity of the physical world. Especially, LLM planning for multi-agent collaboration requires communication of agents or credit assignment as the feedback to re-adjust the proposed plans and achieve effective coordination. However, existing methods that overly rely on physical verification or self-reflection suffer from excessive and inefficient querying of LLMs. In this paper, we propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans. Specifically, we perform critic regression to learn a sequential advantage function from LLM-planned data, and then treat the LLM planner as an optimizer to generate actions that maximize the advantage function. It endows the LLM with the foresight to discern whether the action contributes to accomplishing the final task. We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems. Experiments on Overcooked-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents and query rounds of LLMs, demonstrating its high efficiency for grounding LLMs. More results are given at https://read-llm.github.io/.