🤖 AI Summary
Social media dialogue data suffers from limited interaction patterns and unrealistic multi-party dynamics due to privacy restrictions and platform-specific biases. To address this, we propose a constraint-driven multi-party dialogue synthesis framework: leveraging instruction-tuned large language models (LLMs) to explicitly model structured dialogue constraints—including speaker roles, stances, and turn-taking order—and supporting both end-to-end and turn-by-turn generation paradigms. We introduce the first multidimensional evaluation framework integrating LLM-as-a-judge automated assessment, human annotation, and interaction complexity analysis. Experiments demonstrate that turn-by-turn generation significantly outperforms end-to-end generation in constraint adherence (+23.6%) and lexical diversity (+18.4%). Moreover, several open-source LLMs already exhibit strong capability in generating high-quality, high-fidelity multi-party dialogues. This work establishes a novel paradigm for controllable, rigorously evaluable, and privacy-preserving dialogue data construction.
📝 Abstract
Multi-Party Conversations (MPCs) are widely studied across disciplines, with social media as a primary data source due to their accessibility. However, these datasets raise privacy concerns and often reflect platform-specific properties. For example, interactions between speakers may be limited due to rigid platform structures (e.g., threads, tree-like discussions), which yield overly simplistic interaction patterns (e.g., as a consequence of ``reply-to'' links). This work explores the feasibility of generating diverse MPCs with instruction-tuned Large Language Models (LLMs) by providing deterministic constraints such as dialogue structure and participants' stance. We investigate two complementary strategies of leveraging LLMs in this context: (i.) LLMs as MPC generators, where we task the LLM to generate a whole MPC at once and (ii.) LLMs as MPC parties, where the LLM generates one turn of the conversation at a time, provided the conversation history. We next introduce an analytical framework to evaluate compliance with the constraints, content quality, and interaction complexity for both strategies. Finally, we assess the quality of obtained MPCs via human annotation and LLM-as-a-judge evaluations. We find stark differences among LLMs, with only some being able to generate high-quality MPCs. We also find that turn-by-turn generation yields better conformance to constraints and higher linguistic variability than generating MPCs in one pass. Nonetheless, our structural and qualitative evaluation indicates that both generation strategies can yield high-quality MPCs.