Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models

📅 2024-03-15

🏛️ arXiv.org

📈 Citations: 8

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This paper addresses English-centric bias in large language models (LLMs) stemming from imbalanced training corpora, systematically evaluating the efficacy of translation preprocessing—particularly translation into English—for multilingual tasks. Moving beyond conventional NLP benchmarks, the study introduces real-world user queries, culture-sensitive tasks, and non-English-centric LLMs to conduct cross-lingual empirical analysis. Results show that while translating inputs into English improves performance on certain tasks for English-dominant models, native-language prompting significantly outperforms translation in tasks requiring deep cultural and linguistic understanding. Responses exhibit marked heterogeneity across models and task types. The work challenges the assumption that “translation-as-optimization” is universally beneficial, exposing the limitations of English-centric evaluation practices. It advocates for a new multilingual evaluation framework that explicitly accounts for language-specific properties, cultural context, and model architectural or training biases.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have demonstrated multilingual capabilities, yet they are mostly English-centric due to the imbalanced training corpora. While prior works have leveraged this bias to enhance multilingual performance through translation, they have been largely limited to natural language processing (NLP) tasks. In this work, we extend the evaluation to real-world user queries and non-English-centric LLMs, offering a broader examination of multilingual performance. Our key contribution lies in demonstrating that while translation into English can boost the performance of English-centric LLMs on NLP tasks, it is not universally optimal. For culture-related tasks that need deep language understanding, prompting in the native language proves more effective as it better captures the nuances of culture and language. Our experiments expose varied behaviors across LLMs and tasks in the multilingual context, underscoring the need for a more comprehensive approach to multilingual evaluation. Therefore, we call for greater efforts in developing and evaluating LLMs that go beyond English-centric paradigms.

Problem

Research questions and friction points this paper is trying to address.

Evaluating multilingual performance beyond English-centric LLMs

Assessing translation effectiveness for culture-related language tasks

Exploring varied LLM behaviors in multilingual real-world queries

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends evaluation to real-world multilingual user queries

Shows native language prompts better for culture tasks

Advocates developing non-English-centric LLMs comprehensively

🔎 Similar Papers

No similar papers found.