Do Large Language Models with Reasoning and Acting Meet the Needs of Task-Oriented Dialogue?

📅 2024-12-02

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large language models (LLMs) face challenges in task-oriented dialogue (TOD), particularly in multi-step decision-making and effective external knowledge invocation. Method: This work introduces the ReAct paradigm to TOD for the first time, systematically integrating reasoning–action co-prompting, simulated environment evaluation, real-user testing, and multi-dimensional human evaluation. Contribution/Results: ReAct-enhanced LLMs significantly improve subjective user satisfaction, dialogue naturalness, and robustness in real-world interactions. Although task completion rates remain slightly below current state-of-the-art (SOTA) systems, our findings reveal a fundamental trade-off between objective success metrics and holistic user experience. This challenges purely metric-driven evaluation paradigms and establishes a human-centered framework for both model design and assessment in TOD—paving the way for more usable, trustworthy, and user-aligned conversational agents.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) gained immense popularity due to their impressive capabilities in unstructured conversations. However, they underperform compared to previous approaches in task-oriented dialogue (TOD), wherein reasoning and accessing external information are crucial. Empowering LLMs with advanced prompting strategies such as reasoning and acting (ReAct) has shown promise in solving complex tasks traditionally requiring reinforcement learning. In this work, we apply the ReAct strategy to guide LLMs performing TOD. We evaluate ReAct-based LLMs (ReAct-LLMs) both in simulation and with real users. While ReAct-LLMs seem to underperform state-of-the-art approaches in simulation, human evaluation indicates higher user satisfaction rate compared to handcrafted systems despite having a lower success rate.

Problem

Research questions and friction points this paper is trying to address.

Evaluating ReAct prompting for task-oriented dialogue performance.

Comparing ReAct-LLMs with state-of-the-art methods in simulations.

Assessing human satisfaction with ReAct-LLMs despite lower success rates.

Innovation

Methods, ideas, or system contributions that make the work stand out.

ReAct prompting enhances task-oriented dialogue.

ReAct-LLMs evaluated via simulation and user testing.

Higher user satisfaction despite lower success rates.

🔎 Similar Papers

No similar papers found.

Authors to Follow