RIVAL: Reinforcement Learning with Iterative and Adversarial Optimization for Machine Translation

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Reinforcement Learning from Human Feedback (RLHF) suffers severe performance degradation in spoken-language subtitle translation due to significant distributional shift between the offline reward model (RM) and the online LLM policy. Method: We propose RIVAL, a novel framework that formulates translation optimization as an iterative adversarial game between the RM and the translation model. It introduces reference-free hybrid reward modeling—jointly leveraging qualitative human preference signals and quantitative metrics (e.g., BLEU)—and dynamically co-adapts both the RM and the translation model to mitigate distributional shift. Results: Evaluated on multiple spoken-language translation benchmarks, RIVAL substantially outperforms state-of-the-art methods, achieving significant improvements in translation fluency, faithfulness, and alignment with human judgments.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) possess strong multilingual capabilities, and combining Reinforcement Learning from Human Feedback (RLHF) with translation tasks has shown great potential. However, we observe that this paradigm performs unexpectedly poorly when applied to colloquial subtitle translation tasks. In this work, we investigate this issue and find that the offline reward model (RM) gradually diverges from the online LLM due to distributional shift, ultimately leading to undesirable training outcomes. To address this, we propose RIVAL, an adversarial training framework that formulates the process as a min-max game between the RM and the LLM. RIVAL iteratively updates the both models, with the RM trained to distinguish strong from weak translations (qualitative preference reward), and the LLM trained to enhance its translation for closing this gap. To stabilize training and improve generalizability, we also incorporate quantitative preference reward (e.g., BLEU) into the RM, enabling reference-free quality modeling aligned with human evaluation. Through extensive experiments, we demonstrate that the proposed adversarial training framework significantly improves upon translation baselines.

Problem

Research questions and friction points this paper is trying to address.

Addresses poor performance of RLHF in colloquial subtitle translation

Mitigates reward model divergence due to distributional shift

Proposes adversarial training to align RM and LLM iteratively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial training framework for translation

Iterative updates for RM and LLM

Combines qualitative and quantitative rewards

🔎 Similar Papers

No similar papers found.

Authors to Follow