Lost in the Mix: Evaluating LLM Understanding of Code-Switched Text

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates large language models’ (LLMs) comprehension of code-switched (CSW) text—a critical challenge amid rising multilingual content. To address this, we construct controlled CSW variants of mainstream benchmarks (e.g., GSM8K, BoolQ), and conduct empirical analysis across 12 LLMs using zero-shot/few-shot prompting, supervised fine-tuning, and embedding-space diagnostics. We identify, for the first time, a directional CSW effect: accuracy drops by 14.2% on average when English serves as the base language into which other languages are embedded, whereas it increases by 3.7% when non-English languages serve as the base and English is embedded. Leveraging this insight, we propose a constraint-based CSW construction method and demonstrate that supervised fine-tuning mitigates performance degradation far more effectively than prompting—reducing it by up to 89%. These findings reveal intrinsic asymmetries in LLMs’ cross-lingual representations and offer a novel pathway toward robust multilingual modeling.

Technology Category

Application Category

📝 Abstract
Code-switching (CSW) is the act of alternating between two or more languages within a single discourse. This phenomenon is widespread in multilingual communities, and increasingly prevalent in online content, where users naturally mix languages in everyday communication. As a result, Large Language Models (LLMs), now central to content processing and generation, are frequently exposed to code-switched inputs. Given their widespread use, it is crucial to understand how LLMs process and reason about such mixed-language text. This paper presents a systematic evaluation of LLM comprehension under code-switching by generating CSW variants of established reasoning and comprehension benchmarks. While degradation is evident when foreign tokens disrupt English text$unicode{x2013}$even under linguistic constraints$unicode{x2013}$embedding English into other languages often improves comprehension. Though prompting yields mixed results, fine-tuning offers a more stable path to degradation mitigation.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM understanding of code-switched text
Assessing degradation in LLM performance with mixed-language inputs
Exploring methods to mitigate comprehension issues in code-switching
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates CSW variants of benchmarks
Evaluates LLM comprehension under code-switching
Uses fine-tuning to mitigate degradation
🔎 Similar Papers
2024-03-252024 IEEE/ACM First International Conference on AI Foundation Models and Software Engineering (Forge) Conference Acronym:Citations: 22
2024-06-28Conference on Empirical Methods in Natural Language ProcessingCitations: 20
A
Amr Mohamed
MBZUAI
Y
Yang Zhang
Ecole Polytechnique
M
M. Vazirgiannis
MBZUAI, Ecole Polytechnique
Guokan Shang
Guokan Shang
MBZUAI-IFM Paris Lab