Cross-lingual Transfer of Reward Models in Multilingual Alignment

📅 2024-10-23

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This study investigates the cross-lingual transferability of English reward models (RMs) to non-English languages for improving instruction-following and alignment in multilingual RLHF. We propose a systematic evaluation framework integrating Multilingual RewardBench, representation shift analysis, multilingual instruction fine-tuning, and offline RM comparison. Our first empirical finding shows that untranslated English RMs outperform corresponding monolingual RMs by 3–4% on average across multilingual reward evaluation—demonstrating strong generalization rooted in cross-lingual alignment within the representation space and transferable instruction understanding capabilities. We further validate that this transfer mechanism significantly enhances multilingual instruction-following performance. To foster reproducibility and community advancement, we open-source all code, models, and datasets, establishing a new paradigm for multilingual RLHF.

Technology Category

Application Category

📝 Abstract

Reinforcement learning with human feedback (RLHF) is shown to largely benefit from precise reward models (RMs). However, recent studies in reward modeling schemes are skewed towards English, limiting the applicability of RLHF in multilingual alignments. In this work, we investigate the cross-lingual transfer of RMs trained in diverse languages, primarily from English. Our experimental results demonstrate the strong cross-lingual transfer of English RMs, exceeding target language RMs by 3~4% average increase in Multilingual RewardBench. Furthermore, we analyze the cross-lingual transfer of RMs through the representation shifts. Finally, we perform multilingual alignment to exemplify how cross-lingual transfer in RM propagates to enhanced multilingual instruction-following capability, along with extensive analyses on off-the-shelf RMs. We release the code, model, and data.

Problem

Research questions and friction points this paper is trying to address.

Transfer Learning

Multilingual Environments

Reinforcement Learning with Human Feedback

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-lingual Transfer

Reward Models

Multilingual Instruction Understanding

🔎 Similar Papers

Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization