🤖 AI Summary
This paper addresses English-centric bias in large language models (LLMs) stemming from imbalanced training corpora, systematically evaluating the efficacy of translation preprocessing—particularly translation into English—for multilingual tasks. Moving beyond conventional NLP benchmarks, the study introduces real-world user queries, culture-sensitive tasks, and non-English-centric LLMs to conduct cross-lingual empirical analysis. Results show that while translating inputs into English improves performance on certain tasks for English-dominant models, native-language prompting significantly outperforms translation in tasks requiring deep cultural and linguistic understanding. Responses exhibit marked heterogeneity across models and task types. The work challenges the assumption that “translation-as-optimization” is universally beneficial, exposing the limitations of English-centric evaluation practices. It advocates for a new multilingual evaluation framework that explicitly accounts for language-specific properties, cultural context, and model architectural or training biases.
📝 Abstract
Large language models (LLMs) have demonstrated multilingual capabilities, yet they are mostly English-centric due to the imbalanced training corpora. While prior works have leveraged this bias to enhance multilingual performance through translation, they have been largely limited to natural language processing (NLP) tasks. In this work, we extend the evaluation to real-world user queries and non-English-centric LLMs, offering a broader examination of multilingual performance. Our key contribution lies in demonstrating that while translation into English can boost the performance of English-centric LLMs on NLP tasks, it is not universally optimal. For culture-related tasks that need deep language understanding, prompting in the native language proves more effective as it better captures the nuances of culture and language. Our experiments expose varied behaviors across LLMs and tasks in the multilingual context, underscoring the need for a more comprehensive approach to multilingual evaluation. Therefore, we call for greater efforts in developing and evaluating LLMs that go beyond English-centric paradigms.