ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents

📅 2024-11-01

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Existing LLM-based agents struggle to achieve genuine human-AI collaboration in open-ended dialogues due to rigid clarification mechanisms and insufficient dynamic coordination among reasoning, action, and dialogue. To address this, we propose a “Reason–Speak–Act” triadic synergy paradigm: a goal-driven, template-free architecture enabling dynamic alternation among reasoning, natural language generation, and environment interaction. Our framework integrates chain-of-thought reasoning (ReAct), instruction parsing, dialog state tracking, and API-based environment grounding, supporting real-time task comprehension, goal clarification, state feedback, failure recovery, and plan revision. Evaluated on ALFWorld and WebShop, our method improves task success rates by 6% and 4%, respectively; on MultiWOZ, it boosts Inform and Success scores by 5.5% and 3%. These results demonstrate substantial gains in adaptability and collaborative capability for LLM agents in real-world human-AI co-planning and co-governance scenarios.

Technology Category

Application Category

📝 Abstract

Large language model (LLM)-based agents are increasingly employed to interact with external environments (e.g., games, APIs, world models) to solve user-provided tasks. However, current frameworks often lack the ability to collaborate effectively with users in fully conversational settings. Conversations are essential for aligning on task details, achieving user-defined goals, and satisfying preferences. While existing agents address ambiguity through clarification questions, they underutilize the broader potential of an LLM's conversational capabilities. In this work, we introduce ReSpAct, an LLM-based agent designed to seamlessly integrate reasoning, decision-making, and dynamic dialogue for task-solving. Expanding on reasoning-first approaches like ReAct, ReSpAct employs active, free-flowing dialogues to interpret instructions, clarify goals, provide status updates, resolve subtask failures, and refine plans based on user inputs without any explicit dialogue schema. By alternating between task-solving actions and interactive conversations, ReSpAct demonstrates improved performance across diverse environments. We evaluate ReSpAct in user-interactive settings, including task-oriented dialogue systems (MultiWOZ) and decision-making tasks (ALFWorld, WebShop). ReSpAct outperforms ReAct with absolute success rate improvements of 6% and 4% in ALFWorld and WebShop, respectively, and achieves a 5.5% gain in Inform and a 3% gain in Success scores in MultiWOZ. These results highlight the value of integrating dynamic user-agent collaboration for more effective task resolution.

Problem

Research questions and friction points this paper is trying to address.

Enhancing conversational AI agents for better user collaboration

Integrating reasoning, speaking, and acting in LLM-based agents

Improving task-solving through dynamic dialogue and user interaction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates reasoning, decision-making, and dynamic dialogue

Uses active, free-flowing dialogues for task-solving

Alternates between actions and interactive conversations

🔎 Similar Papers

No similar papers found.