🤖 AI Summary
This work addresses the evaluation and enhancement of large language models (LLMs) in multi-turn interactive settings across realistic domains—including mathematics, programming, healthcare, education, and adversarial jailbreaking—where key challenges involve long-horizon contextual consistency, response robustness, and fairness. We propose the first multidimensional benchmark taxonomy specifically designed for multi-turn dialogue, unifying three technical paradigms: intrinsic model capabilities, external augmentations (e.g., retrieval, memory, knowledge graphs), and agent-level coordination. We construct a structured evaluation resource repository and open-source an extensible challenge framework alongside practical guidelines (Awesome-Multi-Turn-LLMs). Our contributions provide a standardized benchmark, a systematic methodology, and reproducible baselines—advancing rigorous, comparable, and scalable research on multi-turn LLM interaction.
📝 Abstract
Recent advancements in large language models (LLMs) have revolutionized their ability to handle single-turn tasks, yet real-world applications demand sophisticated multi-turn interactions. This survey provides a comprehensive review of recent advancements in evaluating and enhancing multi-turn interactions in LLMs. Focusing on task-specific scenarios, from instruction following in diverse domains such as math and coding to complex conversational engagements in roleplay, healthcare, education, and even adversarial jailbreak settings, we systematically examine the challenges of maintaining context, coherence, fairness, and responsiveness over prolonged dialogues. The paper organizes current benchmarks and datasets into coherent categories that reflect the evolving landscape of multi-turn dialogue evaluation. In addition, we review a range of enhancement methodologies under multi-turn settings, including model-centric strategies (contextual learning, supervised fine-tuning, reinforcement learning, and new architectures), external integration approaches (memory-augmented, retrieval-based methods, and knowledge graph), and agent-based techniques for collaborative interactions. Finally, we discuss open challenges and propose future directions for research to further advance the robustness and effectiveness of multi-turn interactions in LLMs. Related resources and papers are available at https://github.com/yubol-cmu/Awesome-Multi-Turn-LLMs.