Empowering LLMs in Task-Oriented Dialogues: A Domain-Independent Multi-Agent Framework and Fine-Tuning Strategy

๐Ÿ“… 2025-05-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the weak logical reasoning and poor generalization of lightweight large language models (LLMs) in task-oriented dialogue, this paper proposes a domain-agnostic multi-agent framework (DIMF), which decouples intent recognition, slot filling, and response generation into cooperative agentsโ€”thereby reducing learning complexity inherent in monolithic agent paradigms. Furthermore, we introduce a data distribution adaptation (DDA) strategy to mitigate training degradation of direct preference optimization (DPO) on small-scale models. Experiments on MultiWOZ demonstrate that DIMF consistently outperforms existing baselines across all metrics. Notably, it achieves substantial gains in zero-shot cross-domain transfer performance. These results validate that the synergistic combination of multi-agent architectural decomposition and DDA-based joint optimization effectively enhances the task-oriented dialogue capabilities of lightweight LLMs.

Technology Category

Application Category

๐Ÿ“ Abstract
Task-oriented dialogue systems based on Large Language Models (LLMs) have gained increasing attention across various industries and achieved significant results. Current approaches condense complex procedural workflows into a single agent to achieve satisfactory performance on large-scale LLMs. However, these approaches face challenges to achieve comparable performance on fine-tuned lightweight LLMs, due to their limited capabilities in handling multiple complex logic. In this work, we design a Domain-Independent Multi-Agent Framework (DIMF), which contains Intent Classification Agent, Slot Filling Agent and Response Agent. This approach simplifies the learning complexity and enhances the generalization ability by separating the tasks into domain-independent components. In this framework, we enhance the capabilities in contextual understanding using the Direct Preference Optimisation (DPO) method, and propose a simple and effective Data Distribution Adaptation (DDA) method to mitigate degradation issues during DPO training. Experiments conducted on the MultiWOZ datasets show that our proposed method achieves a better average performance among all the baselines. Extensive analysis also demonstrates that our proposed framework exhibits excellent generalizability and zero-shot capability.
Problem

Research questions and friction points this paper is trying to address.

Enhancing lightweight LLMs for task-oriented dialogues
Simplifying complex logic via multi-agent framework
Improving generalization with domain-independent components
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-Independent Multi-Agent Framework simplifies tasks
Direct Preference Optimisation enhances contextual understanding
Data Distribution Adaptation mitigates DPO training degradation
๐Ÿ”Ž Similar Papers
No similar papers found.
Zihao Feng
Zihao Feng
Harbin Institute of Technology
Natural Language ProcessingLarge Language Model
X
Xiaoxue Wang
Platform and Content Group, Tencent
B
Bowen Wu
Platform and Content Group, Tencent
Weihong Zhong
Weihong Zhong
Harbin Institute of Technology
multimodal learninglarge language model
Z
Zhen Xu
Platform and Content Group, Tencent
Hailong Cao
Hailong Cao
Harbin Institute of Technology
T
Tiejun Zhao
Faculty of Computing, Harbin Institute of Technology
Y
Ying Li
School of Software & Microelectronics, Peking University
Baoxun Wang
Baoxun Wang
PCG, Tencent
Natural Language ProcessingDeep LearningChat-Bot