Align Once, Benefit Multilingually: Enforcing Multilingual Consistency for LLM Safety Alignment

📅 2026-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of cross-lingual safety alignment in multilingual large language models, which suffer from high resource demands and a scarcity of high-quality supervision data for low-resource languages, undermining cross-lingual consistency. The authors propose a plug-and-play Multilingual Consistency (MLC) loss that enhances directional alignment among multilingual representation vectors. By leveraging only multilingual prompt variants—without requiring response-level supervision in low-resource languages—the method achieves cross-lingual safety alignment within a single alignment update. This approach is the first to ensure multilingual safety consistency in a single alignment pass, substantially reducing reliance on annotated data for low-resource languages. It consistently improves multilingual safety across diverse model architectures and alignment paradigms while minimally affecting general performance, demonstrating strong cross-lingual generalization capabilities.

Technology Category

Application Category

📝 Abstract
The widespread deployment of large language models (LLMs) across linguistic communities necessitates reliable multilingual safety alignment. However, recent efforts to extend alignment to other languages often require substantial resources, either through large-scale, high-quality supervision in the target language or through pairwise alignment with high-resource languages, which limits scalability. In this work, we propose a resource-efficient method for improving multilingual safety alignment. We introduce a plug-and-play Multi-Lingual Consistency (MLC) loss that can be integrated into existing monolingual alignment pipelines. By improving collinearity between multilingual representation vectors, our method encourages directional consistency at the multilingual semantic level in a single update. This allows simultaneous alignment across multiple languages using only multilingual prompt variants without requiring additional response-level supervision in low-resource languages. We validate the proposed method across different model architectures and alignment paradigms, and demonstrate its effectiveness in enhancing multilingual safety with limited impact on general model utility. Further evaluation across languages and tasks indicates improved cross-lingual generalization, suggesting the proposed approach as a practical solution for multilingual consistency alignment under limited supervision.
Problem

Research questions and friction points this paper is trying to address.

multilingual safety alignment
large language models
cross-lingual generalization
low-resource languages
alignment scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual Safety Alignment
Consistency Loss
Resource-Efficient Alignment
Cross-lingual Generalization
Plug-and-Play Module
🔎 Similar Papers
No similar papers found.
Y
Yuyan Bu
Beijing Academy of Artificial Intelligence
Xiaohao Liu
Xiaohao Liu
National University of Singapore
Multimodal LearningInformation Retrieval
Z
ZhaoXing Ren
Beijing Academy of Artificial Intelligence
Yaodong Yang
Yaodong Yang
Boya (博雅) Assistant Professor at Peking University
Reinforcement LearningAI AlignmentEmbodied AI
J
Juntao Dai
Beijing Academy of Artificial Intelligence, Institute for Artificial Intelligence, Peking University