Evaluating Multilingual and Code-Switched Alignment in LLMs via Synthetic Natural Language Inference

📅 2025-08-20

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

This study investigates the logical consistency and cross-lingual alignment capabilities of large language models (LLMs) in multilingual and code-switched settings. Method: We propose the first logic-controllable, synthetic multilingual natural language inference (NLI) evaluation framework: it automatically generates semantically precise premise–hypothesis pairs, constructs diverse test sets via multilingual translation and controlled code-switching, and validates semantic fidelity through embedding similarity analysis and visualization. Contribution/Results: Contrary to expectations, code-switching does not degrade model performance; instead, it enhances cross-lingual reasoning stability—attributed to implicit regularization induced by translation-driven lexical variation. Our work establishes a reproducible benchmark for multilingual LLM evaluation, releases open-source tooling, and introduces the “multilingual augmentation for robustness” paradigm—a novel approach to improving LLM resilience through controlled linguistic mixing.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are increasingly applied in multilingual contexts, yet their capacity for consistent, logically grounded alignment across languages remains underexplored. We present a controlled evaluation framework for multilingual natural language inference (NLI) that generates synthetic, logic-based premise-hypothesis pairs and translates them into a typologically diverse set of languages. This design enables precise control over semantic relations and allows testing in both monolingual and mixed-language (code-switched) conditions. Surprisingly, code-switching does not degrade, and can even improve, performance, suggesting that translation-induced lexical variation may serve as a regularization signal. We validate semantic preservation through embedding-based similarity analyses and cross-lingual alignment visualizations, confirming the fidelity of translated pairs. Our findings expose both the potential and the brittleness of current LLM cross-lingual reasoning, and identify code-switching as a promising lever for improving multilingual robustness. Code available at: https://github.com/KurbanIntelligenceLab/nli-stress-testing

Problem

Research questions and friction points this paper is trying to address.

Evaluating multilingual and code-switched alignment in large language models

Testing logical consistency across diverse languages through synthetic NLI

Assessing cross-lingual reasoning robustness in monolingual and mixed-language conditions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic logic-based NLI pairs generation

Multilingual and code-switched translation testing

Embedding similarity and alignment visualization validation

🔎 Similar Papers

Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding