🤖 AI Summary
Existing dialogue datasets struggle to jointly model short-term and long-term memory while lacking thematic coherence and speaker consistency, thereby limiting the fine-tuning and evaluation of large language models’ memory capabilities. To address this, this work proposes TopicGuidedChat—a modular, multi-agent framework that, for the first time, enables topic-guided dialogue generation without manual annotation. The framework automatically extracts knowledge graphs, identifies dialogue topics, constructs speaker personas, and simultaneously generates high-quality dialogues along with corresponding memory-oriented question-answer pairs. By integrating explicit long-term memory (in the form of knowledge graphs) with short-term conversational context, the resulting dataset substantially enhances model performance on memory-related question-answering tasks.
📝 Abstract
Recent advancements in Large Language Models (LLMs) have improved their ability to process extended conversational contexts, yet fine-tuning and evaluating short- and long-term memories remain difficult due to the absence of datasets that encode both short- and long-term conversational history. Existing conversational datasets lack memory grounding, overlook topic continuity, or rely on costly human annotation. To address these gaps, we introduce AgenticAI-DialogGen, a modular agent-based framework that generates persona-grounded and topic-guided conversations without human supervision. The framework uses LLM agents to extract knowledge graphs, identify topics, build speaker personas, and simulate topic-guided conversations from unstructured conversations. A QA module generates memory-grounded Question Answer (QA) pairs drawn from short- and long-term conversational histories. We also generated a new dataset entitled, TopicGuidedChat (TGC), where long-term memory is encoded as speaker-specific knowledge graphs and short-term memory as newly generated topic-guided conversations. Evaluations depict that AgenticAI-DialogGen yields higher conversational quality and LLMs fine-tuned on TGC dataset achieve improved performance on memory-grounded QA tasks.