π€ AI Summary
Existing LLM-based agents struggle to achieve genuine human-AI collaboration in open-ended dialogues due to rigid clarification mechanisms and insufficient dynamic coordination among reasoning, action, and dialogue. To address this, we propose a βReasonβSpeakβActβ triadic synergy paradigm: a goal-driven, template-free architecture enabling dynamic alternation among reasoning, natural language generation, and environment interaction. Our framework integrates chain-of-thought reasoning (ReAct), instruction parsing, dialog state tracking, and API-based environment grounding, supporting real-time task comprehension, goal clarification, state feedback, failure recovery, and plan revision. Evaluated on ALFWorld and WebShop, our method improves task success rates by 6% and 4%, respectively; on MultiWOZ, it boosts Inform and Success scores by 5.5% and 3%. These results demonstrate substantial gains in adaptability and collaborative capability for LLM agents in real-world human-AI co-planning and co-governance scenarios.
π Abstract
Large language model (LLM)-based agents are increasingly employed to interact with external environments (e.g., games, APIs, world models) to solve user-provided tasks. However, current frameworks often lack the ability to collaborate effectively with users in fully conversational settings. Conversations are essential for aligning on task details, achieving user-defined goals, and satisfying preferences. While existing agents address ambiguity through clarification questions, they underutilize the broader potential of an LLM's conversational capabilities. In this work, we introduce ReSpAct, an LLM-based agent designed to seamlessly integrate reasoning, decision-making, and dynamic dialogue for task-solving. Expanding on reasoning-first approaches like ReAct, ReSpAct employs active, free-flowing dialogues to interpret instructions, clarify goals, provide status updates, resolve subtask failures, and refine plans based on user inputs without any explicit dialogue schema. By alternating between task-solving actions and interactive conversations, ReSpAct demonstrates improved performance across diverse environments. We evaluate ReSpAct in user-interactive settings, including task-oriented dialogue systems (MultiWOZ) and decision-making tasks (ALFWorld, WebShop). ReSpAct outperforms ReAct with absolute success rate improvements of 6% and 4% in ALFWorld and WebShop, respectively, and achieves a 5.5% gain in Inform and a 3% gain in Success scores in MultiWOZ. These results highlight the value of integrating dynamic user-agent collaboration for more effective task resolution.