TAPO: Translation Augmented Policy Optimization for Multilingual Mathematical Reasoning

📅 2026-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the significant performance degradation of large language models in multilingual mathematical reasoning due to limited language comprehension capabilities. To mitigate this, the authors propose the TAPO framework, which leverages GRPO-based reinforcement learning and adopts an English-centric “understanding–reasoning” decoupling paradigm. A step-level relative advantage mechanism is introduced to integrate translation quality rewards directly into the reasoning optimization process, enabling joint training of translation and reasoning while avoiding objective conflicts. Empirical results demonstrate that TAPO consistently outperforms existing baselines on both multilingual mathematical reasoning and translation tasks, substantially improving reasoning performance in non-English languages and exhibiting strong generalization to unseen languages and cross-domain settings.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable proficiency in English mathematical reasoning, yet a significant performance disparity persists in multilingual contexts, largely attributed to deficiencies in language understanding. To bridge this gap, we introduce Translation-Augmented Policy Optimization (TAPO), a novel reinforcement learning framework built upon GRPO. TAPO enforces an explicit alignment strategy where the model leverages English as a pivot and follows an understand-then-reason paradigm. Crucially, we employ a step-level relative advantage mechanism that decouples understanding from reasoning, allowing the integration of translation quality rewards without introducing optimization conflicts. Extensive experiments reveal that TAPO effectively synergizes language understanding with reasoning capabilities and is compatible with various models. It outperforms baseline methods in both multilingual mathematical reasoning and translation tasks, while generalizing well to unseen languages and out-of-domain tasks.
Problem

Research questions and friction points this paper is trying to address.

multilingual mathematical reasoning
language understanding
performance disparity
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Translation-Augmented Policy Optimization
multilingual mathematical reasoning
step-level relative advantage
understand-then-reason paradigm
reinforcement learning
🔎 Similar Papers
No similar papers found.