Lost in the Mix: Evaluating LLM Understanding of Code-Switched Text

📅 2025-06-16

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This study systematically evaluates large language models’ (LLMs) comprehension of code-switched (CSW) text—a critical challenge amid rising multilingual content. To address this, we construct controlled CSW variants of mainstream benchmarks (e.g., GSM8K, BoolQ), and conduct empirical analysis across 12 LLMs using zero-shot/few-shot prompting, supervised fine-tuning, and embedding-space diagnostics. We identify, for the first time, a directional CSW effect: accuracy drops by 14.2% on average when English serves as the base language into which other languages are embedded, whereas it increases by 3.7% when non-English languages serve as the base and English is embedded. Leveraging this insight, we propose a constraint-based CSW construction method and demonstrate that supervised fine-tuning mitigates performance degradation far more effectively than prompting—reducing it by up to 89%. These findings reveal intrinsic asymmetries in LLMs’ cross-lingual representations and offer a novel pathway toward robust multilingual modeling.

Technology Category

Application Category

📝 Abstract

Code-switching (CSW) is the act of alternating between two or more languages within a single discourse. This phenomenon is widespread in multilingual communities, and increasingly prevalent in online content, where users naturally mix languages in everyday communication. As a result, Large Language Models (LLMs), now central to content processing and generation, are frequently exposed to code-switched inputs. Given their widespread use, it is crucial to understand how LLMs process and reason about such mixed-language text. This paper presents a systematic evaluation of LLM comprehension under code-switching by generating CSW variants of established reasoning and comprehension benchmarks. While degradation is evident when foreign tokens disrupt English text$unicode{x2013}$even under linguistic constraints$unicode{x2013}$embedding English into other languages often improves comprehension. Though prompting yields mixed results, fine-tuning offers a more stable path to degradation mitigation.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM understanding of code-switched text

Assessing degradation in LLM performance with mixed-language inputs

Exploring methods to mitigate comprehension issues in code-switching

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates CSW variants of benchmarks

Evaluates LLM comprehension under code-switching

Uses fine-tuning to mitigate degradation

🔎 Similar Papers

Exploring the Impact of the Output Format on the Evaluation of Large Language Models for Code Translation