Mitigating Hallucinated Translations in Large Language Models with Hallucination-focused Preference Optimization

📅 2025-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit high hallucination rates in machine translation—particularly in low-resource and zero-shot cross-lingual settings—compromising reliability and introducing security risks. To address this, we propose a novel training-stage paradigm for endogenous hallucination suppression, eliminating reliance on post-hoc detection or retranslation. Our contributions are threefold: (1) the first hallucination-oriented preference data construction framework enabling end-to-end hallucination mitigation; (2) DPO-based alignment fine-tuning integrated with a multilingual hallucination annotation schema; and (3) the first empirical validation of generalization under zero-shot cross-lingual transfer. Experiments across five language pairs show an average 96% reduction in hallucination rate; when extended zero-shot to three unseen target languages, hallucination decreases by 89%, with no degradation in translation quality (e.g., BLEU).

Technology Category

Application Category

📝 Abstract
Machine Translation (MT) is undergoing a paradigm shift, with systems based on fine-tuned large language models (LLM) becoming increasingly competitive with traditional encoder-decoder models trained specifically for translation tasks. However, LLM-based systems are at a higher risk of generating hallucinations, which can severely undermine user's trust and safety. Most prior research on hallucination mitigation focuses on traditional MT models, with solutions that involve post-hoc mitigation - detecting hallucinated translations and re-translating them. While effective, this approach introduces additional complexity in deploying extra tools in production and also increases latency. To address these limitations, we propose a method that intrinsically learns to mitigate hallucinations during the model training phase. Specifically, we introduce a data creation framework to generate hallucination focused preference datasets. Fine-tuning LLMs on these preference datasets reduces the hallucination rate by an average of 96% across five language pairs, while preserving overall translation quality. In a zero-shot setting our approach reduces hallucinations by 89% on an average across three unseen target languages.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Machine Translation
Error Propensity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine Translation
Error Reduction
Training Data Generation
🔎 Similar Papers
No similar papers found.