Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation

📅 2025-04-02

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This paper addresses key bottlenecks in low-resource machine translation (MT)—including scarce parallel corpora, limited linguistic tools, and constrained computational resources—by systematically surveying the paradigm evolution of large language models (LLMs) for MT. Methodologically, it integrates few-shot prompting, cross-lingual transfer, parameter-efficient fine-tuning (e.g., LoRA), LLM-driven synthetic data generation (e.g., back-translation, lexical augmentation), and native quality evaluation metrics. It is the first to comprehensively identify LLM-MT–specific challenges: hallucination, bias propagation, and evaluation inconsistency—and empirically delineates performance boundaries between LLM-based and traditional encoder-decoder MT across multilingual settings. The work contributes a principled, inclusive, and scalable MT development framework; provides a reproducible technical decision guide with benchmark comparisons; and establishes foundations for robust, trustworthy MT in low-resource scenarios. (149 words)

Technology Category

Application Category

📝 Abstract

The advent of Large Language Models (LLMs) has significantly reshaped the landscape of machine translation (MT), particularly for low-resource languages and domains that lack sufficient parallel corpora, linguistic tools, and computational infrastructure. This survey presents a comprehensive overview of recent progress in leveraging LLMs for MT. We analyze techniques such as few-shot prompting, cross-lingual transfer, and parameter-efficient fine-tuning that enable effective adaptation to under-resourced settings. The paper also explores synthetic data generation strategies using LLMs, including back-translation and lexical augmentation. Additionally, we compare LLM-based translation with traditional encoder-decoder models across diverse language pairs, highlighting the strengths and limitations of each. We discuss persistent challenges such as hallucinations, evaluation inconsistencies, and inherited biases while also evaluating emerging LLM-driven metrics for translation quality. This survey offers practical insights and outlines future directions for building robust, inclusive, and scalable MT systems in the era of large-scale generative models.

Problem

Research questions and friction points this paper is trying to address.

Leveraging LLMs for low-resource machine translation

Addressing data scarcity with synthetic generation techniques

Evaluating LLM strengths/limitations vs traditional MT models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Few-shot prompting for low-resource adaptation

Synthetic data generation via back-translation

Parameter-efficient fine-tuning techniques

🔎 Similar Papers

Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models