Beyond English-Centric Training: How Reinforcement Learning Improves Cross-Lingual Reasoning in LLMs

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates how reinforcement learning (RL) and supervised fine-tuning (SFT) affect the cross-lingual reasoning generalization of large language models (LLMs), focusing on non-English mathematical, commonsense, and scientific reasoning tasks. Using Qwen2.5-3B-Base as the base model, we conduct systematic performance comparisons between RL and SFT on multilingual reasoning benchmarks and analyze the mechanistic role of non-English training data. Our key contributions are: (1) the first empirical demonstration that RL substantially outperforms SFT in cross-lingual reasoning, exhibiting superior generalization and robustness across languages; and (2) evidence that incorporating non-English data during training effectively mitigates English-centric bias, leading to consistent improvements in multilingual reasoning accuracy and cross-lingual strategy transfer. These findings provide both a novel methodology and empirical foundation for developing truly multilingual, general-purpose reasoning models.

Technology Category

Application Category

📝 Abstract
Enhancing the complex reasoning capabilities of Large Language Models (LLMs) attracts widespread attention. While reinforcement learning (RL) has shown superior performance for improving complex reasoning, its impact on cross-lingual generalization compared to Supervised Fine-Tuning (SFT) remains unexplored. We present the first systematic investigation into cross-lingual reasoning generalization of RL and SFT. Using Qwen2.5-3B-Base as our foundation model, we conduct experiments on diverse multilingual reasoning benchmarks, including math reasoning, commonsense reasoning, and scientific reasoning. Our investigation yields two significant findings: (1) Tuning with RL not only achieves higher accuracy but also demonstrates substantially stronger cross-lingual generalization capabilities compared to SFT. (2) RL training on non-English data yields better overall performance and generalization than training on English data, which is not observed with SFT. Furthermore, through comprehensive mechanistic analyses, we explore the underlying factors of RL's superiority and generalization across languages. Our results provide compelling evidence that RL enables the model with more robust reasoning strategies, offering crucial guidance for more equitable and effective multilingual reasoning.
Problem

Research questions and friction points this paper is trying to address.

Investigates cross-lingual reasoning generalization of reinforcement learning versus supervised fine-tuning
Compares multilingual reasoning performance across math, commonsense, and scientific domains
Explores reinforcement learning's superiority in non-English training for robust multilingual reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning enhances cross-lingual reasoning generalization
RL outperforms supervised fine-tuning in multilingual reasoning benchmarks
Non-English RL training yields superior performance and generalization
🔎 Similar Papers
No similar papers found.
S
Shulin Huang
Zhejiang University, Westlake University
Yiran Ding
Yiran Ding
HDU
LLMMLSys
J
Junshu Pan
Zhejiang University, Westlake University
Y
Yue Zhang
Westlake University