Empowering LLMs in Task-Oriented Dialogues: A Domain-Independent Multi-Agent Framework and Fine-Tuning Strategy

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address the weak logical reasoning and poor generalization of lightweight large language models (LLMs) in task-oriented dialogue, this paper proposes a domain-agnostic multi-agent framework (DIMF), which decouples intent recognition, slot filling, and response generation into cooperative agents—thereby reducing learning complexity inherent in monolithic agent paradigms. Furthermore, we introduce a data distribution adaptation (DDA) strategy to mitigate training degradation of direct preference optimization (DPO) on small-scale models. Experiments on MultiWOZ demonstrate that DIMF consistently outperforms existing baselines across all metrics. Notably, it achieves substantial gains in zero-shot cross-domain transfer performance. These results validate that the synergistic combination of multi-agent architectural decomposition and DDA-based joint optimization effectively enhances the task-oriented dialogue capabilities of lightweight LLMs.

Technology Category

Application Category

📝 Abstract

Task-oriented dialogue systems based on Large Language Models (LLMs) have gained increasing attention across various industries and achieved significant results. Current approaches condense complex procedural workflows into a single agent to achieve satisfactory performance on large-scale LLMs. However, these approaches face challenges to achieve comparable performance on fine-tuned lightweight LLMs, due to their limited capabilities in handling multiple complex logic. In this work, we design a Domain-Independent Multi-Agent Framework (DIMF), which contains Intent Classification Agent, Slot Filling Agent and Response Agent. This approach simplifies the learning complexity and enhances the generalization ability by separating the tasks into domain-independent components. In this framework, we enhance the capabilities in contextual understanding using the Direct Preference Optimisation (DPO) method, and propose a simple and effective Data Distribution Adaptation (DDA) method to mitigate degradation issues during DPO training. Experiments conducted on the MultiWOZ datasets show that our proposed method achieves a better average performance among all the baselines. Extensive analysis also demonstrates that our proposed framework exhibits excellent generalizability and zero-shot capability.

Problem

Research questions and friction points this paper is trying to address.

Enhancing lightweight LLMs for task-oriented dialogues

Simplifying complex logic via multi-agent framework

Improving generalization with domain-independent components

Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-Independent Multi-Agent Framework simplifies tasks

Direct Preference Optimisation enhances contextual understanding

Data Distribution Adaptation mitigates DPO training degradation

🔎 Similar Papers

Adaptive In-conversation Team Building for Language Model Agents