How Important is `Perfect' English for Machine Translation Prompts?

📅 2025-07-13

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This study systematically investigates how human-plausible errors (e.g., spelling, grammatical deviations) and synthetic noise (character-level, composite-level) in user prompts affect large language models’ (LLMs) performance on machine translation (MT) and MT evaluation tasks. Using controlled noise injection, combined with quantitative evaluation and qualitative analysis, we find that prompt quality critically impacts model behavior: low-quality prompts primarily impair instruction following—not the intrinsic quality of translations—with composite and character-level noise proving most detrimental. A key contribution is the identification of “prompt robustness”: even when prompts are severely distorted to the point of human unreadability, LLMs retain basic translation capability, suggesting strong underlying semantic decoupling. These findings provide novel empirical evidence for prompt engineering, robustness assessment, and human–AI collaborative translation practice.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have achieved top results in recent machine translation evaluations, but they are also known to be sensitive to errors and perturbations in their prompts. We systematically evaluate how both humanly plausible and synthetic errors in user prompts affect LLMs' performance on two related tasks: Machine translation and machine translation evaluation. We provide both a quantitative analysis and qualitative insights into how the models respond to increasing noise in the user prompt. The prompt quality strongly affects the translation performance: With many errors, even a good prompt can underperform a minimal or poor prompt without errors. However, different noise types impact translation quality differently, with character-level and combined noisers degrading performance more than phrasal perturbations. Qualitative analysis reveals that lower prompt quality largely leads to poorer instruction following, rather than directly affecting translation quality itself. Further, LLMs can still translate in scenarios with overwhelming random noise that would make the prompt illegible to humans.

Problem

Research questions and friction points this paper is trying to address.

Evaluates impact of prompt errors on LLM translation performance

Compares human vs synthetic errors in translation tasks

Analyzes how noise types differentially affect translation quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematically evaluates human and synthetic prompt errors

Analyzes noise impact on translation and evaluation tasks

Reveals LLMs translate under overwhelming random noise

🔎 Similar Papers

Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models