Data-Efficient Domain Adaptation for LLM-based MT using Contrastive Preference Optimization

📅 2025-10-31

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

To address the high annotation cost and reliance on large-scale supervised fine-tuning (SFT) when adapting large language models (LLMs) to machine translation, this paper proposes an efficient data alignment method based on Contrastive Preference Optimization (CPO). The core method constructs high-quality preference pairs without additional human annotation: automatically generated translations serve as “rejected” responses, while manually post-edited translations retrieved from a memory bank serve as “chosen” responses. Integrating few-shot learning with implicit human feedback modeling, the approach significantly improves data efficiency. Experiments on English–Portuguese and English–Korean translation tasks demonstrate that the method achieves comparable performance to SFT baselines trained on over 1.6 million samples—using only 147,000 preference pairs—thereby validating its substantial advantages in both data efficiency and domain adaptability.

Technology Category

Application Category

📝 Abstract

LLMs often require adaptation to domain-specific requirements, a process that can be expensive when relying solely on SFT. We present an empirical study on applying CPO to simulate a post-editing workflow for data-efficient domain adaptation. Our approach synthesizes preference pairs by treating the base model's own raw output as the 'rejected' translation and the human-approved TM entry as the 'chosen' one. This method provides direct feedback on the model's current knowledge, guiding it to align with domain-specific standards. Experiments in English-Brazilian Portuguese and English-Korean show that, by using just 14.7k preference pairs, the model achieves performance close to that of a model trained on 160k+ samples with SFT, demonstrating significant data efficiency. Although we showcase its effectiveness in MT, this application of CPO naturally generalizes to other generative tasks where a model's initial drafts can serve as a contrastive signal against a golden reference.

Problem

Research questions and friction points this paper is trying to address.

Adapting LLMs to domain-specific MT requirements efficiently

Reducing data dependency for machine translation fine-tuning

Using contrastive learning between model outputs and references

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses contrastive preference optimization for domain adaptation

Synthesizes preference pairs from model output and human reference

Achieves high data efficiency with minimal training samples

🔎 Similar Papers

No similar papers found.