A Framework to Assess Multilingual Vulnerabilities of LLMs

📅 2025-03-17

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work addresses the security vulnerabilities of large language models (LLMs) in low-resource languages (LRLs), stemming from scarce training data and biased evaluation practices. We propose the first scalable, multilingual automated security vulnerability assessment framework. It integrates cross-lingual adversarial sample generation, response consistency scoring, and human-in-the-loop calibration, enabling systematic safety evaluation across six mainstream LLMs in eight languages—including six LRLs. Experimental results reveal that LRL security weaknesses primarily arise from model performance degradation rather than failure of malicious alignment. Automated assessments achieve over 85% agreement with human judgments. Crucially, this study is the first to demonstrate that LRL security risks are fundamentally non-adversarial in nature—i.e., rooted in generalization failures rather than targeted jailbreaking. Our framework establishes a reproducible, extensible methodological foundation for multilingual safety evaluation, advancing both empirical rigor and practical applicability in LRL security research.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are acquiring a wider range of capabilities, including understanding and responding in multiple languages. While they undergo safety training to prevent them from answering illegal questions, imbalances in training data and human evaluation resources can make these models more susceptible to attacks in low-resource languages (LRL). This paper proposes a framework to automatically assess the multilingual vulnerabilities of commonly used LLMs. Using our framework, we evaluated six LLMs across eight languages representing varying levels of resource availability. We validated the assessments generated by our automated framework through human evaluation in two languages, demonstrating that the framework's results align with human judgments in most cases. Our findings reveal vulnerabilities in LRL; however, these may pose minimal risk as they often stem from the model's poor performance, resulting in incoherent responses.

Problem

Research questions and friction points this paper is trying to address.

Assess multilingual vulnerabilities in Large Language Models (LLMs).

Address imbalances in training data for low-resource languages (LRL).

Validate automated framework results with human evaluation.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework assesses multilingual LLM vulnerabilities

Automated evaluation across six LLMs, eight languages

Human validation aligns with automated framework results

🔎 Similar Papers

Learn and Unlearn in Multilingual LLMs