Multi-Party Supervised Fine-tuning of Language Models for Multi-Party Dialogue Generation

📅 2024-12-06
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language models (LLMs) are predominantly fine-tuned on two-party dialogue data, exhibiting poor generalization in multi-party dialogues (MPDs), thereby limiting their applicability to real-world collaborative scenarios such as meetings and group discussions. To address this, we propose MuPaS, a Multi-Role Supervised Fine-tuning framework that introduces the first end-to-end supervised fine-tuning paradigm tailored for MPDs. MuPaS constructs a novel multi-role dialogue dataset and jointly optimizes two core tasks: next-speaker prediction and role-aware response generation. Furthermore, it incorporates a configurable multi-party dialogue simulation training strategy to enhance contextual and social awareness. Extensive experiments demonstrate that MuPaS achieves state-of-the-art performance in multi-party response quality, speaker prediction accuracy, and cross-topic/role/scenario generalization—significantly outperforming multi-agent-based baselines. This work establishes a new foundation for virtual rehearsal systems and immersive multi-user interaction in the metaverse.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLM) are usually fine-tuned to participate in dyadic or two-party dialogues, which can not adapt well to multi-party dialogues (MPD), which hinders their applications in such scenarios including multi-personal meetings, discussions and daily communication. Previous LLM-based researches mainly focus on the multi-agent framework, while their base LLMs are still pairwisely fine-tuned. In this work, we design a multi-party fine-tuning framework (MuPaS) for LLMs on the multi-party dialogue datasets, and prove such a straightforward framework can let the LLM align with the multi-party conversation style efficiently and effectively. We also design two training strategies which can convert MuPaS into the MPD simulator. Substantial experiments show that MuPaS can achieve state-of-the-art multi-party response, higher accuracy of the-next-speaker prediction, higher human and automatic evaluated utterance qualities, and can even generate reasonably with out-of-distribution scene, topic and role descriptions. The MuPaS framework bridges the LLM training with more complicated multi-party applications, such as conversation generation, virtual rehearsal or meta-universe.
Problem

Research questions and friction points this paper is trying to address.

Adapting LLMs for multi-party dialogue generation
Improving next-speaker prediction accuracy in dialogues
Enhancing utterance quality in multi-party conversations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-party fine-tuning framework for LLMs
Training strategies for MPD simulator
State-of-the-art multi-party response generation
X
Xiaoyu Wang
Geely Automobile Research Institute (Ningbo) Co., Ltd, Beijing Institute of Technology
Ningyuan Xi
Ningyuan Xi
Beihang University
LLMNature Language ProcessingMachine Learning
T
Teng Chen
Geely Automobile Research Institute (Ningbo) Co., Ltd
Q
Qingqing Gu
Geely Automobile Research Institute (Ningbo) Co., Ltd
Y
Yue Zhao
Geely Automobile Research Institute (Ningbo) Co., Ltd
X
Xiaokai Chen
Beijing Institute of Technology
Z
Zhonglin Jiang
Geely Automobile Research Institute (Ningbo) Co., Ltd
Y
Yong Chen
Geely Automobile Research Institute (Ningbo) Co., Ltd
Luo Ji
Luo Ji
Alibaba Group
Reinforcement LearningAutomatic Control