🤖 AI Summary
Zero-shot, open-vocabulary dialogue state tracking (DST) in task-oriented dialogues remains challenging due to reliance on predefined slot values and domain labels, weak generalization of lightweight models, and prohibitive computational overhead of large language models (LLMs).
Method: We propose the first zero-shot, open-vocabulary joint modeling framework for DST. It reformulates DST as a question-answering (QA) task to enable efficient lightweight model adaptation; introduces a self-refining prompt mechanism to enhance LLM generalization and inference efficiency; and unifies domain classification and state tracking into an end-to-end joint prediction pipeline.
Contribution/Results: Leveraging QA-based task formulation, self-refining prompt engineering, and optimized LLM API invocation strategies, our approach achieves a 20% absolute improvement in joint goal accuracy (JGA) over prior SOTA on Multi-WOZ 2.1, while reducing LLM API calls by 90%.
📝 Abstract
Dialogue State Tracking (DST) is crucial for understanding user needs and executing appropriate system actions in task-oriented dialogues. Majority of existing DST methods are designed to work within predefined ontologies and assume the availability of gold domain labels, struggling with adapting to new slots values. While Large Language Models (LLMs)-based systems show promising zero-shot DST performance, they either require extensive computational resources or they underperform existing fully-trained systems, limiting their practicality. To address these limitations, we propose a zero-shot, open-vocabulary system that integrates domain classification and DST in a single pipeline. Our approach includes reformulating DST as a question-answering task for less capable models and employing self-refining prompts for more adaptable ones. Our system does not rely on fixed slot values defined in the ontology allowing the system to adapt dynamically. We compare our approach with existing SOTA, and show that it provides up to 20% better Joint Goal Accuracy (JGA) over previous methods on datasets like Multi-WOZ 2.1, with up to 90% fewer requests to the LLM API.