Lost in Translation? A Comparative Study on the Cross-Lingual Transfer of Composite Harms

📅 2026-02-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current safety evaluations of large language models are predominantly confined to English, and cross-lingual transfer via translation often fails to accurately capture the nuanced manifestations of composite harmful content in non-English contexts. This work proposes CompositeHarm, a novel multilingual safety benchmark that, for the first time, integrates structured adversarial attacks with contextually realistic harm data across six languages—including several Indian languages—to establish a lightweight, scalable, and environmentally sustainable evaluation framework. Experimental results reveal that adversarial syntactic attacks achieve significantly higher success rates in Indian languages, while contextual harm exhibits more moderate cross-lingual transferability. These findings demonstrate that translation-based approaches alone are insufficient for achieving language-adaptive safety alignment, thereby underscoring the necessity and effectiveness of the proposed framework in evaluating cross-lingual safety robustness.

Technology Category

Application Category

📝 Abstract
Most safety evaluations of large language models (LLMs) remain anchored in English. Translation is often used as a shortcut to probe multilingual behavior, but it rarely captures the full picture, especially when harmful intent or structure morphs across languages. Some types of harm survive translation almost intact, while others distort or disappear. To study this effect, we introduce CompositeHarm, a translation-based benchmark designed to examine how safety alignment holds up as both syntax and semantics shift. It combines two complementary English datasets, AttaQ, which targets structured adversarial attacks, and MMSafetyBench, which covers contextual, real-world harms, and extends them into six languages: English, Hindi, Assamese, Marathi, Kannada, and Gujarati. Using three large models, we find that attack success rates rise sharply in Indic languages, especially under adversarial syntax, while contextual harms transfer more moderately. To ensure scalability and energy efficiency, our study adopts lightweight inference strategies inspired by edge-AI design principles, reducing redundant evaluation passes while preserving cross-lingual fidelity. This design makes large-scale multilingual safety testing both computationally feasible and environmentally conscious. Overall, our results show that translated benchmarks are a necessary first step, but not a sufficient one, toward building grounded, resource-aware, language-adaptive safety systems.
Problem

Research questions and friction points this paper is trying to address.

cross-lingual transfer
composite harms
safety evaluation
large language models
translation fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-lingual transfer
composite harms
lightweight inference
safety alignment
multilingual benchmark
🔎 Similar Papers
No similar papers found.
V
Vaibhav Shukla
Indian Institute Of Information Technology, Allahabad
Hardik Sharma
Hardik Sharma
Google
Deep LearningComputer ArchitectureHardware AccelerationApproximate Computing
A
Adith N Reganti
Indian Institute Of Information Technology, Allahabad
S
Soham Wasmatkar
Manipal University Jaipur
B
Bagesh Kumar
Manipal University Jaipur
Vrijendra Singh
Vrijendra Singh
Professor, IT Dept.@Indian Institute of Information Technology Allahabad
ML & GenAIData AnalyticsSocial Network AnalysisInformation SecurityTime Series Analytics