Beyond Input Understanding: Diagnosing Multilingual Mathematical Reasoning with Directed Acyclic Trace Graphs

📅 2026-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the significant performance degradation in mathematical reasoning for low- and medium-resource languages, which stems not only from inadequate input comprehension but also from language-induced interference during the reasoning execution itself. The authors propose the DATG framework, which, for the first time, reveals how language affects the reasoning process and constructs a language-agnostic directed acyclic graph of reasoning trajectories. By leveraging mathematical anchors for alignment and dependency analysis, DATG enables fine-grained diagnosis of issues such as coverage gaps, dependency distortions, and operational errors. The study further introduces two test-time intervention strategies—Loop-Retry and Formula-Retry—to specifically mitigate these problems. Evaluations across 12 languages on the Qwen3 model series demonstrate that the proposed approach substantially improves mathematical reasoning accuracy for low-resource languages.
📝 Abstract
Large reasoning models (LRMs) achieve strong mathematical reasoning performance in English, but remain much less reliable in many low- and medium-resource languages. This gap is often explained as a failure to understand non-English problem statements. We show that this view is incomplete: even when the problem is given in English, controlling the model's reasoning language can substantially reduce accuracy, suggesting that language also affects reasoning execution itself. To study this effect, we introduce DATG, a Directed Acyclic Trace Graph framework that maps reasoning traces to language-independent mathematical anchors and dependencies. This allows us to align target-language traces with reference DAGs and measure whether they cover required mathematical nodes, respect dependency edges, and avoid harmful mathematical actions. Experiments on the Qwen3 series across 12 languages show that non-English reasoning often suffers from reduced anchor coverage and weaker dependency fidelity, especially in low-resource languages. Motivated by this diagnosis, we propose Loop-Retry and Formula-Retry, two simple test-time controls targeting DATG-exposed failure modes, and show that they consistently improve target-language reasoning performance in low-resource languages.
Problem

Research questions and friction points this paper is trying to address.

multilingual mathematical reasoning
reasoning execution
low-resource languages
language dependency
reasoning reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Directed Acyclic Trace Graph
multilingual mathematical reasoning
reasoning execution
test-time control
language-independent anchors
🔎 Similar Papers
No similar papers found.