Improving LLMs for Machine Translation Using Synthetic Preference Data

📅 2025-08-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the scarcity of high-quality human preference data for machine translation in low-resource languages. We propose an efficient fine-tuning method leveraging synthetically generated preference data. Taking Slovenian as a case study, we first generate initial translation pairs using GaMS-9B-Instruct and EuroLLM-9B-Instruct; then construct high-quality ranked data via heuristic rules and COMET-based automatic evaluation; finally apply Direct Preference Optimization (DPO) for lightweight model adaptation—eliminating the need for costly human annotation. Experiments on Wikipedia translation show that the fine-tuned model achieves COMET score improvements of +0.04 and +0.02 over the two baselines, with notable gains in linguistic accuracy, formatting consistency, and cross-sentence coherence. Our key contribution is the principled integration of automated-evaluation-driven synthetic preference generation with DPO, establishing a scalable paradigm for optimizing low-resource LLMs for translation.

Technology Category

Application Category

📝 Abstract
Large language models have emerged as effective machine translation systems. In this paper, we explore how a general instruction-tuned large language model can be improved for machine translation using relatively few easily produced data resources. Using Slovene as a use case, we improve the GaMS-9B-Instruct model using Direct Preference Optimization (DPO) training on a programmatically curated and enhanced subset of a public dataset. As DPO requires pairs of quality-ranked instances, we generated its training dataset by translating English Wikipedia articles using two LLMs, GaMS-9B-Instruct and EuroLLM-9B-Instruct. We ranked the resulting translations based on heuristics coupled with automatic evaluation metrics such as COMET. The evaluation shows that our fine-tuned model outperforms both models involved in the dataset generation. In comparison to the baseline models, the fine-tuned model achieved a COMET score gain of around 0.04 and 0.02, respectively, on translating Wikipedia articles. It also more consistently avoids language and formatting errors.
Problem

Research questions and friction points this paper is trying to address.

Improving machine translation quality using synthetic preference data
Enhancing LLMs with few easily produced data resources
Optimizing translation performance via Direct Preference Optimization training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Direct Preference Optimization for translation enhancement
Synthetic preference data from dual LLM translations
Heuristic and metric-based translation ranking system
🔎 Similar Papers
No similar papers found.
D
Dario Vajda
University of Ljubljana, Faculty of Computer and Information Science
D
Domen Vreš
University of Ljubljana, Faculty of Computer and Information Science
Marko Robnik-Šikonja
Marko Robnik-Šikonja
Professor of Computer Science, University of Ljubljana, Head of ML & LT Lab
Machine LearningArtificial IntelligenceNatural Language ProcessingExplainable AI