ProKG-Dial: Progressive Multi-Turn Dialogue Construction with Domain Knowledge Graphs

📅 2025-08-03

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

To address insufficient domain knowledge coverage and weak semantic coherence in large language models (LLMs) for professional-domain dialogue, this paper proposes a knowledge-graph-based progressive multi-turn dialogue data construction method. First, community detection partitions the domain-specific knowledge graph into semantically coherent subgraphs; then, structured question-answer pairs are generated incrementally around target entities to ensure topical focus and knowledge progression. The method integrates graph partitioning, LLM-driven multi-turn dialogue generation, multi-stage filtering, and supervised fine-tuning, validated through both automated evaluation and human assessment. Experiments on a medical knowledge graph demonstrate that the generated data significantly improves dialogue relevance, diversity, and domain knowledge coverage. Fine-tuned models outperform existing baselines across multiple metrics, confirming the method’s effectiveness and generalizability to specialized domains.

Technology Category

Application Category

📝 Abstract

Current large language models (LLMs) excel at general NLP tasks but often lack domain specific precision in professional settings. Building a high quality domain specific multi turn dialogue dataset is essential for developing specialized conversational systems. However, existing methods such as manual annotation, simulated human LLM interactions, and role based LLM dialogues are resource intensive or suffer from limitations in dialogue quality and domain coverage. To address these challenges, we introduce ProKG Dial, a progressive framework for constructing knowledge intensive multi turn dialogue datasets using domain specific knowledge graphs (KGs). ProKG Dial leverages the structured nature of KGs to encode complex domain knowledge and relationships, providing a solid foundation for generating meaningful and coherent dialogues. Specifically, ProKG Dial begins by applying community detection to partition the KG into semantically cohesive subgraphs. For each subgraph, the framework incrementally generates a series of questions and answers centered around a target entity, ensuring relevance and coverage. A rigorous filtering step is employed to maintain high dialogue quality. We validate ProKG Dial on a medical knowledge graph by evaluating the generated dialogues in terms of diversity, semantic coherence, and entity coverage. Furthermore, we fine tune a base LLM on the resulting dataset and benchmark it against several baselines. Both automatic metrics and human evaluations demonstrate that ProKG Dial substantially improves dialogue quality and domain specific performance, highlighting its effectiveness and practical utility.

Problem

Research questions and friction points this paper is trying to address.

LLMs lack domain-specific precision in professional settings

Existing methods for dialogue dataset construction are resource-intensive or limited

Need for high-quality domain-specific multi-turn dialogue datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive framework using domain knowledge graphs

Community detection for semantic subgraph partitioning

Incremental QA generation with rigorous filtering

🔎 Similar Papers

No similar papers found.