Can xLLMs Understand the Structure of Dialog? Exploring Multilingual Response Generation in Complex Scenarios

📅 2025-01-20

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Large language models (xLLMs) exhibit structural understanding bottlenecks in multilingual, multiparty dialogue, yet existing benchmarks lack realistic complexity and multilingual parallelism. Method: We introduce XMP—a high-quality, multilingual parallel dialogue benchmark derived from authentic multiparty podcasts—featuring ≥3 participants per sample, covering sociocultural and political topics, with fine-grained dialogue structure annotations and cross-lingual consistency evaluation. Contribution/Results: Our empirical analysis reveals critical deficiencies: xLLMs achieve only 52% role-tracking accuracy and suffer a 37% drop in response coherence across languages, challenging the prevailing “multilingual complementarity” hypothesis. We propose a novel paradigm for modeling complex dialogue grounded in real-world podcast data, supported by controlled generation experiments and mechanistic analysis. The XMP dataset and evaluation framework are publicly released to advance standardized assessment of multilingual multiparty dialogue understanding.

Technology Category

Application Category

📝 Abstract

Multilingual research has garnered increasing attention, especially in the domain of dialogue systems. The rapid advancements in large language models (LLMs) have fueled the demand for high-performing multilingual models. However, two major challenges persist: the scarcity of high-quality multilingual datasets and the limited complexity of existing datasets in capturing realistic dialogue scenarios. To address these gaps, we introduce XMP, a high-quality parallel Multilingual dataset sourced from Multi-party Podcast dialogues. Each sample in the dataset features at least three participants discussing a wide range of topics, including society, culture, politics, and entertainment.Through extensive experiments, we uncover significant limitations in previously recognized multilingual capabilities of LLMs when applied to such complex dialogue scenarios. For instance, the widely accepted multilingual complementary ability of LLMs is notably impacted. By conducting further experiments, we explore the mechanisms of LLMs in multilingual environments from multiple perspectives, shedding new light on their performance in real-world, diverse conversational contexts.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Multilingual Dialogue

Cross-lingual Understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual Dataset

Complex Dialogue

Large Language Model Performance

🔎 Similar Papers

Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models