๐ค AI Summary
To address the weak logical reasoning and poor generalization of lightweight large language models (LLMs) in task-oriented dialogue, this paper proposes a domain-agnostic multi-agent framework (DIMF), which decouples intent recognition, slot filling, and response generation into cooperative agentsโthereby reducing learning complexity inherent in monolithic agent paradigms. Furthermore, we introduce a data distribution adaptation (DDA) strategy to mitigate training degradation of direct preference optimization (DPO) on small-scale models. Experiments on MultiWOZ demonstrate that DIMF consistently outperforms existing baselines across all metrics. Notably, it achieves substantial gains in zero-shot cross-domain transfer performance. These results validate that the synergistic combination of multi-agent architectural decomposition and DDA-based joint optimization effectively enhances the task-oriented dialogue capabilities of lightweight LLMs.
๐ Abstract
Task-oriented dialogue systems based on Large Language Models (LLMs) have gained increasing attention across various industries and achieved significant results. Current approaches condense complex procedural workflows into a single agent to achieve satisfactory performance on large-scale LLMs. However, these approaches face challenges to achieve comparable performance on fine-tuned lightweight LLMs, due to their limited capabilities in handling multiple complex logic. In this work, we design a Domain-Independent Multi-Agent Framework (DIMF), which contains Intent Classification Agent, Slot Filling Agent and Response Agent. This approach simplifies the learning complexity and enhances the generalization ability by separating the tasks into domain-independent components. In this framework, we enhance the capabilities in contextual understanding using the Direct Preference Optimisation (DPO) method, and propose a simple and effective Data Distribution Adaptation (DDA) method to mitigate degradation issues during DPO training. Experiments conducted on the MultiWOZ datasets show that our proposed method achieves a better average performance among all the baselines. Extensive analysis also demonstrates that our proposed framework exhibits excellent generalizability and zero-shot capability.