Lost in Interpretation: The Plausibility-Faithfulness Trade-off in Cross-Lingual Explanations

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

142K/year

🤖 AI Summary

This study addresses a critical yet overlooked issue in multilingual large language models: their frequent reliance on English to generate explanations for non-English inputs, which may compromise causal faithfulness and the preservation of sociopragmatic information. The work presents the first systematic evaluation of such “English-mediated explanations” in cross-lingual settings, employing extractive explanation methods and measuring alignment with human annotations through comprehensiveness and sufficiency metrics. Across diverse tasks, languages, and models, the experiments reveal that while English-mediated explanations maintain fluency and task accuracy, they suffer substantial degradation in explanation quality—comprehensiveness drops by up to 5.7×, and both faithfulness and span agreement with human rationales are significantly reduced—highlighting the hidden cost of using English as an explanatory intermediary.

📝 Abstract

LLMs deployed multilingually are often audited via English explanations for non-English inputs. We evaluate extractive explanations ''where the model identifies input token spans as evidence alongside a generated rationale'' and uncover a systematic trade-off: English-pivot explanations can achieve higher span agreement with human rationales while their evidence becomes less causally grounded in the model's prediction, as measured by both comprehensiveness and sufficiency. Across 3 tasks, 5~languages, and 2~multilingual LLM families, we find that English explanations frequently produce fluent but loosely anchored rationales, with comprehensiveness degrading by up to 5.7x relative to native-language conditions - even as task accuracy remains stable across settings. For socially nuanced classification, English pivots also fail to preserve pragmatic cues, reducing both faithfulness and span agreement. We recommend auditing explanations in the input language, reporting multi-faceted faithfulness metrics beyond lexical overlap, and treating English rationales as communication summaries rather than faithful decision traces.

Problem

Research questions and friction points this paper is trying to address.

cross-lingual explanations

plausibility-faithfulness trade-off

multilingual LLMs

extractive explanations

explanation faithfulness

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-lingual explanations

faithfulness-plausibility trade-off

extractive rationales