Unlocking LLM Safeguards for Low-Resource Languages via Reasoning and Alignment with Minimal Training Data

📅 2025-10-12

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

To address poor interpretability, weak generalization, and high data dependency of large-model safety detection methods for low-resource languages, this paper proposes ConsistentGuard—a few-shot multilingual safety defense framework leveraging chain-of-thought (CoT) reasoning enhancement and cross-lingual representation alignment. ConsistentGuard integrates CoT-driven interpretable classification, multilingual semantic alignment, and meta-learning to enable cross-lingual malicious request detection under extreme supervision (only 1,000 annotated samples). We introduce the first multilingual safety evaluation benchmark extension covering six low-resource languages across three established benchmarks. Experiments demonstrate that ConsistentGuard consistently outperforms fully supervised baselines in detection accuracy, decision interpretability, and zero-/few-shot cross-lingual transfer capability. Our framework establishes a new paradigm for trustworthy AI deployment in low-resource settings—delivering efficiency, transparency, and scalability without compromising robustness.

Technology Category

Application Category

📝 Abstract

Recent advances in LLMs have enhanced AI capabilities, but also increased the risk posed by malicious requests, highlighting the need for effective LLM safeguards to detect such queries. Existing approaches largely rely on classifier-based methods that lack interpretability and perform poorly on low-resource languages. To address these limitations, we propose ConsistentGuard, a novel reasoning-based multilingual safeguard, which enhances explainability via reasoning and boosts knowledge transfer between languages through alignment. With only 1,000 training samples, our method demonstrates superior performance on three datasets across six languages, outperforming larger models trained with significantly more data, and exhibits strong interpretability and generalization ability. We also contribute a multilingual benchmark extension and release our codes to support future research.

Problem

Research questions and friction points this paper is trying to address.

Detecting malicious queries in low-resource languages effectively

Improving interpretability of LLM safeguards through reasoning mechanisms

Enhancing cross-language knowledge transfer with minimal training data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reasoning-based multilingual safeguard for LLM protection

Alignment technique enhances cross-language knowledge transfer

Minimal training data achieves superior multilingual performance

🔎 Similar Papers

Cross-Modal Safety Alignment: Is textual unlearning all you need?