🤖 AI Summary
This study addresses the scarcity of high-quality human preference data for machine translation in low-resource languages. We propose an efficient fine-tuning method leveraging synthetically generated preference data. Taking Slovenian as a case study, we first generate initial translation pairs using GaMS-9B-Instruct and EuroLLM-9B-Instruct; then construct high-quality ranked data via heuristic rules and COMET-based automatic evaluation; finally apply Direct Preference Optimization (DPO) for lightweight model adaptation—eliminating the need for costly human annotation. Experiments on Wikipedia translation show that the fine-tuned model achieves COMET score improvements of +0.04 and +0.02 over the two baselines, with notable gains in linguistic accuracy, formatting consistency, and cross-sentence coherence. Our key contribution is the principled integration of automated-evaluation-driven synthetic preference generation with DPO, establishing a scalable paradigm for optimizing low-resource LLMs for translation.
📝 Abstract
Large language models have emerged as effective machine translation systems. In this paper, we explore how a general instruction-tuned large language model can be improved for machine translation using relatively few easily produced data resources. Using Slovene as a use case, we improve the GaMS-9B-Instruct model using Direct Preference Optimization (DPO) training on a programmatically curated and enhanced subset of a public dataset. As DPO requires pairs of quality-ranked instances, we generated its training dataset by translating English Wikipedia articles using two LLMs, GaMS-9B-Instruct and EuroLLM-9B-Instruct. We ranked the resulting translations based on heuristics coupled with automatic evaluation metrics such as COMET. The evaluation shows that our fine-tuned model outperforms both models involved in the dataset generation. In comparison to the baseline models, the fine-tuned model achieved a COMET score gain of around 0.04 and 0.02, respectively, on translating Wikipedia articles. It also more consistently avoids language and formatting errors.