π€ AI Summary
This work addresses the degradation in reasoning quality for non-English tasks caused by language drift in multilingual large language models during reinforcement learning. To mitigate this issue, the authors propose the LANG framework, which introduces a novel language-conditional prompting mechanism integrated with a progressive prompt decay schedule and a language-adaptive switching strategy. This approach dynamically balances linguistic consistency with reasoning performance. Experimental results demonstrate that LANG significantly improves reasoning accuracy on multilingual mathematical reasoning benchmarks, effectively alleviates language drift, and exhibits strong cross-task and cross-model generalization in maintaining language alignment.
π Abstract
Reinforcement learning has proven effective for enhancing multi-step reasoning in large language models (LLMs), yet its benefits have not fully translated to multilingual contexts. Existing methods struggle with a fundamental trade-off: prioritizing input-language consistency severely hampers reasoning quality, while prioritizing reasoning often leads to unintended language drift toward English. We address this challenge with LANG, a novel framework that leverages language-conditioned hints to guide exploration in non-English reasoning tasks. Our method incorporates two key mechanisms to prevent dependency on these hints: a progressive decay schedule that gradually withdraws scaffolding, and a language-adaptive switch that tailors learning horizons to specific language difficulties. Empirical results on challenging multilingual mathematical benchmarks reveal that LANG substantially enhances reasoning performance without compromising language consistency. Moreover, we show that our framework generalizes beyond mathematics, fostering more consistent language alignment across model layers