LANG: Reinforcement Learning for Multilingual Reasoning with Language-Adaptive Hint Guidance

πŸ“… 2026-05-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

170K/year
πŸ€– AI Summary
This work addresses the degradation in reasoning quality for non-English tasks caused by language drift in multilingual large language models during reinforcement learning. To mitigate this issue, the authors propose the LANG framework, which introduces a novel language-conditional prompting mechanism integrated with a progressive prompt decay schedule and a language-adaptive switching strategy. This approach dynamically balances linguistic consistency with reasoning performance. Experimental results demonstrate that LANG significantly improves reasoning accuracy on multilingual mathematical reasoning benchmarks, effectively alleviates language drift, and exhibits strong cross-task and cross-model generalization in maintaining language alignment.
πŸ“ Abstract
Reinforcement learning has proven effective for enhancing multi-step reasoning in large language models (LLMs), yet its benefits have not fully translated to multilingual contexts. Existing methods struggle with a fundamental trade-off: prioritizing input-language consistency severely hampers reasoning quality, while prioritizing reasoning often leads to unintended language drift toward English. We address this challenge with LANG, a novel framework that leverages language-conditioned hints to guide exploration in non-English reasoning tasks. Our method incorporates two key mechanisms to prevent dependency on these hints: a progressive decay schedule that gradually withdraws scaffolding, and a language-adaptive switch that tailors learning horizons to specific language difficulties. Empirical results on challenging multilingual mathematical benchmarks reveal that LANG substantially enhances reasoning performance without compromising language consistency. Moreover, we show that our framework generalizes beyond mathematics, fostering more consistent language alignment across model layers
Problem

Research questions and friction points this paper is trying to address.

multilingual reasoning
language drift
reinforcement learning
language consistency
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

language-adaptive hint guidance
reinforcement learning
multilingual reasoning
language drift mitigation
progressive decay schedule
πŸ”Ž Similar Papers
No similar papers found.