ConsistentChat: Building Skeleton-Guided Consistent Dialogues for Large Language Models from Scratch

๐Ÿ“… 2025-06-04
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address low task completion rates in multi-turn dialogue caused by context drift, this paper proposes a skeleton-guided multi-turn instruction generation framework. The method introduces the first taxonomy of nine human dialogue intent trajectories, explicitly encoding global dialogue structure into a structured generation skeleton; it further employs intent-constrained controllable data distillation to construct, from scratch, ConsistentChatโ€”the first large-scale, cross-turn consistent multi-turn instruction dataset (15,000+ dialogues, 224,392 utterances). Experiments on Light, TopDial, and MT-Eval benchmarks demonstrate that the framework improves dialogue consistency by 20โ€“30% and achieves up to a 15% gain in task success rate. Key contributions include: (1) intent-driven dialogue structure modeling; (2) a skeleton-guided paradigm for controllable synthetic data generation; and (3) ConsistentChat, the first publicly available large-scale instruction dataset ensuring cross-turn consistency.

Technology Category

Application Category

๐Ÿ“ Abstract
Current instruction data synthesis methods primarily focus on single-turn instructions and often neglect cross-turn coherence, resulting in context drift and reduced task completion rates in extended conversations. To address this limitation, we propose Skeleton-Guided Multi-Turn Dialogue Generation, a framework that constrains multi-turn instruction synthesis by explicitly modeling human conversational intent. It operates in two stages: (1) Intent Modeling, which captures the global structure of human dialogues by assigning each conversation to one of nine well-defined intent trajectories, ensuring a coherent and goal-oriented information flow; and (2) Skeleton Generation, which constructs a structurally grounded sequence of user queries aligned with the modeled intent, thereby serving as a scaffold that constrains and guides the downstream instruction synthesis process. Based on this process, we construct ConsistentChat, a multi-turn instruction dataset with approximately 15,000 multi-turn conversations and 224,392 utterances. Experiments on the Light, Topdial, and MT-Eval benchmarks show that models fine-tuned on ConsistentChat achieve a 20-30% improvement in chat consistency and up to a 15% increase in task success rate, significantly outperforming models trained on existing single-turn and multi-turn instruction datasets.
Problem

Research questions and friction points this paper is trying to address.

Addressing cross-turn coherence in multi-turn dialogues
Improving task completion rates in extended conversations
Enhancing chat consistency and intent modeling accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Skeleton-Guided Multi-Turn Dialogue Generation
Intent Modeling with nine defined trajectories
Skeleton Generation for coherent query sequences
๐Ÿ”Ž Similar Papers
No similar papers found.
J
Jiawei Chen
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Xinyan Guan
Xinyan Guan
Institute of Software, Chinese Academy of Sciences
Q
Qianhao Yuan
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences
G
Guozhao Mo
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences
W
Weixiang Zhou
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences
Yaojie Lu
Yaojie Lu
Institute of Software, Chinese Academy of Sciences
Information ExtractionLarge Language Models
H
Hongyu Lin
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences
Ben He
Ben He
Professor, University of Chinese Academy of Sciences
Natural Language ProcessingInformation Retrieval
Le Sun
Le Sun
Institute of Software, CAS
information_retrievalnatural_language_processing
X
Xianpei Han
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences