🤖 AI Summary
Large language models (LLMs) exhibit high hallucination rates in machine translation—particularly in low-resource and zero-shot cross-lingual settings—compromising reliability and introducing security risks. To address this, we propose a novel training-stage paradigm for endogenous hallucination suppression, eliminating reliance on post-hoc detection or retranslation. Our contributions are threefold: (1) the first hallucination-oriented preference data construction framework enabling end-to-end hallucination mitigation; (2) DPO-based alignment fine-tuning integrated with a multilingual hallucination annotation schema; and (3) the first empirical validation of generalization under zero-shot cross-lingual transfer. Experiments across five language pairs show an average 96% reduction in hallucination rate; when extended zero-shot to three unseen target languages, hallucination decreases by 89%, with no degradation in translation quality (e.g., BLEU).
📝 Abstract
Machine Translation (MT) is undergoing a paradigm shift, with systems based on fine-tuned large language models (LLM) becoming increasingly competitive with traditional encoder-decoder models trained specifically for translation tasks. However, LLM-based systems are at a higher risk of generating hallucinations, which can severely undermine user's trust and safety. Most prior research on hallucination mitigation focuses on traditional MT models, with solutions that involve post-hoc mitigation - detecting hallucinated translations and re-translating them. While effective, this approach introduces additional complexity in deploying extra tools in production and also increases latency. To address these limitations, we propose a method that intrinsically learns to mitigate hallucinations during the model training phase. Specifically, we introduce a data creation framework to generate hallucination focused preference datasets. Fine-tuning LLMs on these preference datasets reduces the hallucination rate by an average of 96% across five language pairs, while preserving overall translation quality. In a zero-shot setting our approach reduces hallucinations by 89% on an average across three unseen target languages.