Beyond Single-User Dialogue: Assessing Multi-User Dialogue State Tracking Capabilities of Large Language Models

📅 2025-06-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the underexplored challenge of multi-user dialogue state tracking (DST). We present the first systematic extension of single-user DST benchmarks to multi-user settings. To construct multi-speaker evaluation data cost-effectively and controllably—without manual annotation—we propose an automated utterance injection method grounded in speech act theory. Through zero-shot LLM inference and multi-role dialogue structure modeling, we observe a substantial performance degradation (average F1 drop of 23.6%) across mainstream large language models on multi-user DST, exposing their critical limitations in modeling speaker-interaction dynamics. Our contributions include: (1) the first open-source multi-user DST benchmark; (2) a reproducible, theory-informed data generation framework; and (3) a principled failure analysis identifying key bottlenecks in role-aware dialogue understanding. This work establishes foundational resources and insights for advancing robust, multi-role dialogue state tracking.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have demonstrated remarkable performance in zero-shot dialogue state tracking (DST), reducing the need for task-specific training. However, conventional DST benchmarks primarily focus on structured user-agent conversations, failing to capture the complexities of real-world multi-user interactions. In this study, we assess the robustness of LLMs in multi-user DST while minimizing dataset construction costs. Inspired by recent advances in LLM-based data annotation, we extend an existing DST dataset by generating utterances of a second user based on speech act theory. Our methodology systematically incorporates a second user's utterances into conversations, enabling a controlled evaluation of LLMs in multi-user settings. Experimental results reveal a significant performance drop compared to single-user DST, highlighting the limitations of current LLMs in extracting and tracking dialogue states amidst multiple speakers. Our findings emphasize the need for future research to enhance LLMs for multi-user DST scenarios, paving the way for more realistic and robust DST models.
Problem

Research questions and friction points this paper is trying to address.

Assessing LLMs' multi-user dialogue state tracking capabilities
Extending DST datasets to include multi-user interactions
Evaluating performance drop in LLMs for multi-user DST
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extend DST dataset with second user utterances
Use speech act theory for multi-user annotation
Evaluate LLMs in controlled multi-user settings
🔎 Similar Papers
No similar papers found.