CM-Align: Consistency-based Multilingual Alignment for Large Language Models

๐Ÿ“… 2025-09-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Current multilingual alignment of large language models (LLMs) is hindered by the poor quality of preference data: (i) using raw English responses as references introduces low-quality samples, and (ii) heuristic cross-lingual pairing yields preference pairs with high bias and noise. To address this, we propose a consistency-guided framework for multilingual preference construction. First, high-quality English references are selected based on semantic consistency among model responses. Then, robust cross-lingual preference pairs are generated via cross-lingual consistency evaluation and employed for Direct Preference Optimization (DPO). Our method achieves significant improvements over strong baselines across three mainstream LLMs and three representative multilingual tasksโ€”including instruction following, translation, and reasoning. Results demonstrate that high-quality, consistency-driven preference data is critical for effective multilingual alignment. Moreover, our framework establishes a scalable, principled paradigm for constructing multilingual preference datasets, advancing the reliability and generalizability of alignment methods beyond English.

Technology Category

Application Category

๐Ÿ“ Abstract
Current large language models (LLMs) generally show a significant performance gap in alignment between English and other languages. To bridge this gap, existing research typically leverages the model's responses in English as a reference to select the best/worst responses in other languages, which are then used for Direct Preference Optimization (DPO) training. However, we argue that there are two limitations in the current methods that result in noisy multilingual preference data and further limited alignment performance: 1) Not all English responses are of high quality, and using a response with low quality may mislead the alignment for other languages. 2) Current methods usually use biased or heuristic approaches to construct multilingual preference pairs. To address these limitations, we design a consistency-based data selection method to construct high-quality multilingual preference data for improving multilingual alignment (CM-Align). Specifically, our method includes two parts: consistency-guided English reference selection and cross-lingual consistency-based multilingual preference data construction. Experimental results on three LLMs and three common tasks demonstrate the effectiveness and superiority of our method, which further indicates the necessity of constructing high-quality preference data.
Problem

Research questions and friction points this paper is trying to address.

Bridging performance gap in multilingual LLM alignment
Addressing noisy multilingual preference data issues
Improving cross-lingual consistency in preference optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Consistency-based multilingual data selection method
Cross-lingual consistency for preference construction
Quality-focused English reference selection approach
๐Ÿ”Ž Similar Papers
No similar papers found.
X
Xue Zhang
Key Laboratory of Big Data & Artificial Intelligence in Transportation, Beijing Jiaotong University, Ministry of Education; School of Computer Science and Technology, Beijing Jiaotong University, Beijing, China
Yunlong Liang
Yunlong Liang
WeChat
Natural Language Processing (NLP)
Fandong Meng
Fandong Meng
WeChat AI, Tencent
Machine TranslationNatural Language Processing
Songming Zhang
Songming Zhang
Beijing Jiaotong University
natural language processingtext generationmachine translation
Y
Yufeng Chen
Key Laboratory of Big Data & Artificial Intelligence in Transportation, Beijing Jiaotong University, Ministry of Education; School of Computer Science and Technology, Beijing Jiaotong University, Beijing, China
Jinan Xu
Jinan Xu
Professor of School of Computer and Information Technology, Beijing Jiaotong University
NLPMachine TranslationLLM
J
Jie Zhou
Pattern Recognition Center, WeChat AI, Tencent Inc, China