🤖 AI Summary
Language bias in multilingual training corpora induces semantic drift and logical inconsistency in small-parameter (<10B) large language models during complex mathematical reasoning. To address this, we propose the Cross-Lingual Consistency (CLC) reasoning framework: it generates chain-of-thought rationales in parallel across 11 languages and applies cross-lingual majority voting to aggregate outputs, thereby mitigating corpus-induced biases and expanding the search over the global solution space beyond monolingual limitations. CLC establishes the first multilingual collaborative reasoning paradigm and is the first to employ cross-lingual ensemble methods to stabilize reasoning in small models. On the CMATH and MGSM benchmarks, CLC achieves absolute accuracy gains of 6.0–9.5% and 4.1–18.5% over self-consistency, respectively—demonstrating that cross-lingual collaboration significantly enhances the robustness of small models in complex mathematical reasoning.
📝 Abstract
Chain-of-thought (CoT) has emerged as a critical mechanism for enhancing reasoning capabilities in large language models (LLMs), with self-consistency demonstrating notable promise in boosting performance. However, inherent linguistic biases in multilingual training corpora frequently cause semantic drift and logical inconsistencies, especially in sub-10B parameter LLMs handling complex inference tasks. To overcome these constraints, we propose the Cross-Lingual Consistency (CLC) framework, an innovative inference paradigm that integrates multilingual reasoning paths through majority voting to elevate LLMs' reasoning capabilities. Empirical evaluations on the CMATH dataset reveal CLC's superiority over the conventional self-consistency method, delivering 9.5%, 6.5%, and 6.0% absolute accuracy gains for DeepSeek-Math-7B-Instruct, Qwen2.5-Math-7B-Instruct, and Gemma2-9B-Instruct respectively. Expanding CLC's linguistic scope to 11 diverse languages implies two synergistic benefits: 1) neutralizing linguistic biases in multilingual training corpora through multilingual ensemble voting, 2) escaping monolingual reasoning traps by exploring the broader multilingual solution space. This dual benefits empirically enables more globally optimal reasoning paths compared to monolingual self-consistency baselines, as evidenced by the 4.1%-18.5% accuracy gains using Gemma2-9B-Instruct on the MGSM dataset.