🤖 AI Summary
Reinforcement Learning from Human Feedback (RLHF) suffers severe performance degradation in spoken-language subtitle translation due to significant distributional shift between the offline reward model (RM) and the online LLM policy.
Method: We propose RIVAL, a novel framework that formulates translation optimization as an iterative adversarial game between the RM and the translation model. It introduces reference-free hybrid reward modeling—jointly leveraging qualitative human preference signals and quantitative metrics (e.g., BLEU)—and dynamically co-adapts both the RM and the translation model to mitigate distributional shift.
Results: Evaluated on multiple spoken-language translation benchmarks, RIVAL substantially outperforms state-of-the-art methods, achieving significant improvements in translation fluency, faithfulness, and alignment with human judgments.
📝 Abstract
Large language models (LLMs) possess strong multilingual capabilities, and combining Reinforcement Learning from Human Feedback (RLHF) with translation tasks has shown great potential. However, we observe that this paradigm performs unexpectedly poorly when applied to colloquial subtitle translation tasks. In this work, we investigate this issue and find that the offline reward model (RM) gradually diverges from the online LLM due to distributional shift, ultimately leading to undesirable training outcomes. To address this, we propose RIVAL, an adversarial training framework that formulates the process as a min-max game between the RM and the LLM. RIVAL iteratively updates the both models, with the RM trained to distinguish strong from weak translations (qualitative preference reward), and the LLM trained to enhance its translation for closing this gap. To stabilize training and improve generalizability, we also incorporate quantitative preference reward (e.g., BLEU) into the RM, enabling reference-free quality modeling aligned with human evaluation. Through extensive experiments, we demonstrate that the proposed adversarial training framework significantly improves upon translation baselines.