🤖 AI Summary
This study addresses a critical yet overlooked issue in multilingual large language models: their frequent reliance on English to generate explanations for non-English inputs, which may compromise causal faithfulness and the preservation of sociopragmatic information. The work presents the first systematic evaluation of such “English-mediated explanations” in cross-lingual settings, employing extractive explanation methods and measuring alignment with human annotations through comprehensiveness and sufficiency metrics. Across diverse tasks, languages, and models, the experiments reveal that while English-mediated explanations maintain fluency and task accuracy, they suffer substantial degradation in explanation quality—comprehensiveness drops by up to 5.7×, and both faithfulness and span agreement with human rationales are significantly reduced—highlighting the hidden cost of using English as an explanatory intermediary.
📝 Abstract
LLMs deployed multilingually are often audited via English explanations for non-English inputs. We evaluate extractive explanations ''where the model identifies input token spans as evidence alongside a generated rationale'' and uncover a systematic trade-off: English-pivot explanations can achieve higher span agreement with human rationales while their evidence becomes less causally grounded in the model's prediction, as measured by both comprehensiveness and sufficiency. Across 3 tasks, 5~languages, and 2~multilingual LLM families, we find that English explanations frequently produce fluent but loosely anchored rationales, with comprehensiveness degrading by up to 5.7x relative to native-language conditions - even as task accuracy remains stable across settings. For socially nuanced classification, English pivots also fail to preserve pragmatic cues, reducing both faithfulness and span agreement. We recommend auditing explanations in the input language, reporting multi-faceted faithfulness metrics beyond lexical overlap, and treating English rationales as communication summaries rather than faithful decision traces.