Align to the Pivot: Dual Alignment with Self-Feedback for Multilingual Math Reasoning

πŸ“… 2026-01-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the performance degradation of large language models in multilingual mathematical reasoning, particularly for low-resource languages, stemming from misalignment between language understanding and reasoning capabilities. To mitigate this, the authors propose PASMR, a novel approach that introduces a pivot-language-based dual-alignment self-feedback mechanism. The method first translates the target-language problem into a high-resource pivot language to align reasoning patterns, then leverages the pivot-language reasoning output to supervise the reasoning process in the target languageβ€”all without requiring external answers or reward signals. By integrating multilingual translation, cross-lingual reasoning transfer, and self-supervised fine-tuning, PASMR substantially enhances overall performance on multilingual math reasoning tasks, with especially pronounced gains for low-resource languages.

Technology Category

Application Category

πŸ“ Abstract
Despite the impressive reasoning abilities demonstrated by large language models (LLMs), empirical evidence indicates that they are not language agnostic as expected, leading to performance declines in multilingual settings, especially for low-resource languages. We attribute the decline to the model's inconsistent multilingual understanding and reasoning alignment. To address this, we present Pivot-Aligned Self-Feedback Multilingual Reasoning (PASMR), aiming to improve the alignment of multilingual math reasoning abilities in LLMs. This approach designates the model's primary language as the pivot language. During training, the model first translates questions into the pivot language to facilitate better alignment of reasoning patterns. The reasoning process in the target language is then supervised by the pivot language's reasoning answers, thereby establishing a cross-lingual self-feedback mechanism without relying on external correct answers or reward models. Extensive experimental results demonstrate that our method enhances both the model's understanding of questions and its reasoning capabilities, leading to notable task improvements.
Problem

Research questions and friction points this paper is trying to address.

multilingual reasoning
language alignment
low-resource languages
mathematical reasoning
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

multilingual reasoning
pivot language
self-feedback
cross-lingual alignment
mathematical reasoning
πŸ”Ž Similar Papers
No similar papers found.