SLAM: Towards Efficient Multilingual Reasoning via Selective Language Alignment

📅 2025-01-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the performance degradation of large language models (LLMs) in multilingual reasoning and the high computational cost and catastrophic forgetting induced by existing two-stage fine-tuning approaches, this paper proposes Selective Language Alignment (SLA). We first identify that linguistic representations are highly concentrated in lower transformer layers; leveraging this insight, we design a hierarchical alignment strategy that selectively fine-tunes only six FFN sub-layers (6.5–8% of total parameters) in 7B/13B models. SLA integrates layer importance analysis, single-stage end-to-end training, multilingual instruction tuning, and cross-lingual representation alignment. Evaluated across ten languages, SLA consistently outperforms all strong baselines in average multilingual reasoning accuracy, achieves 4.1–11.9× training speedup, significantly mitigates catastrophic forgetting, and improves parameter efficiency—demonstrating superior effectiveness and scalability for multilingual LLM adaptation.

Technology Category

Application Category

📝 Abstract
Despite the significant improvements achieved by large language models (LLMs) in English reasoning tasks, these models continue to struggle with multilingual reasoning. Recent studies leverage a full-parameter and two-stage training paradigm to teach models to first understand non-English questions and then reason. However, this method suffers from both substantial computational resource computing and catastrophic forgetting. The fundamental cause is that, with the primary goal of enhancing multilingual comprehension, an excessive number of irrelevant layers and parameters are tuned during the first stage. Given our findings that the representation learning of languages is merely conducted in lower-level layers, we propose an efficient multilingual reasoning alignment approach that precisely identifies and fine-tunes the layers responsible for handling multilingualism. Experimental results show that our method, SLAM, only tunes 6 layers' feed-forward sub-layers including 6.5-8% of all parameters within 7B and 13B LLMs, achieving superior average performance than all strong baselines across 10 languages. Meanwhile, SLAM only involves one training stage, reducing training time by 4.1-11.9 compared to the two-stage method.
Problem

Research questions and friction points this paper is trying to address.

Multilingual Reasoning
Resource Consumption
Knowledge Forgetting
Innovation

Methods, ideas, or system contributions that make the work stand out.

SLAM
Multilingual Inference
Efficient Training
🔎 Similar Papers
No similar papers found.
Yuchun Fan
Yuchun Fan
Northeastern University
Yongyu Mu
Yongyu Mu
Northeastern University
multilingualismmachine translationefficient models
Y
Yilin Wang
NLP Lab, School of Computer Science and Engineering, Northeastern University, Shenyang, China
L
Lei Huang
Harbin Institute of Technology, Harbin, China
J
Junhao Ruan
NLP Lab, School of Computer Science and Engineering, Northeastern University, Shenyang, China
Bei Li
Bei Li
Meituan LLM Team
Machine TranslationDeep LearningLarge Language Models
T
Tong Xiao
NLP Lab, School of Computer Science and Engineering, Northeastern University, Shenyang, China; NiuTrans Research, Shenyang, China
Shujian Huang
Shujian Huang
School of Computer Science, Nanjing University
Natural Language ProcessingMachine TranslationMultilingualismLarge Language Models
Xiaocheng Feng
Xiaocheng Feng
Harbin Institute of Technology
NLPDeep Learning MachineLearning
Jingbo Zhu
Jingbo Zhu
Northeastern University, China
Machine TranslationLanguage ParsingNatural Language Processing