Unlocking LLM Safeguards for Low-Resource Languages via Reasoning and Alignment with Minimal Training Data

📅 2025-10-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor interpretability, weak generalization, and high data dependency of large-model safety detection methods for low-resource languages, this paper proposes ConsistentGuard—a few-shot multilingual safety defense framework leveraging chain-of-thought (CoT) reasoning enhancement and cross-lingual representation alignment. ConsistentGuard integrates CoT-driven interpretable classification, multilingual semantic alignment, and meta-learning to enable cross-lingual malicious request detection under extreme supervision (only 1,000 annotated samples). We introduce the first multilingual safety evaluation benchmark extension covering six low-resource languages across three established benchmarks. Experiments demonstrate that ConsistentGuard consistently outperforms fully supervised baselines in detection accuracy, decision interpretability, and zero-/few-shot cross-lingual transfer capability. Our framework establishes a new paradigm for trustworthy AI deployment in low-resource settings—delivering efficiency, transparency, and scalability without compromising robustness.

Technology Category

Application Category

📝 Abstract
Recent advances in LLMs have enhanced AI capabilities, but also increased the risk posed by malicious requests, highlighting the need for effective LLM safeguards to detect such queries. Existing approaches largely rely on classifier-based methods that lack interpretability and perform poorly on low-resource languages. To address these limitations, we propose ConsistentGuard, a novel reasoning-based multilingual safeguard, which enhances explainability via reasoning and boosts knowledge transfer between languages through alignment. With only 1,000 training samples, our method demonstrates superior performance on three datasets across six languages, outperforming larger models trained with significantly more data, and exhibits strong interpretability and generalization ability. We also contribute a multilingual benchmark extension and release our codes to support future research.
Problem

Research questions and friction points this paper is trying to address.

Detecting malicious queries in low-resource languages effectively
Improving interpretability of LLM safeguards through reasoning mechanisms
Enhancing cross-language knowledge transfer with minimal training data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reasoning-based multilingual safeguard for LLM protection
Alignment technique enhances cross-language knowledge transfer
Minimal training data achieves superior multilingual performance
🔎 Similar Papers
Zhuowei Chen
Zhuowei Chen
Bytedance
Video GenerationMultimodal Generation
Bowei Zhang
Bowei Zhang
Peking University
N
Nankai Lin
Guangdong University of Foreign Studies, China.; Guangzhou Key Laboratory of Multilingual Intelligent Processing, China.
Tian Hou
Tian Hou
Guangdong University of Foreign Studies, China.
L
Lianxi Wang
Guangdong University of Foreign Studies, China.; Guangzhou Key Laboratory of Multilingual Intelligent Processing, China.