Can Large Language Models Understand, Reason About, and Generate Code-Switched Text?

📅 2026-01-12
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of systematic evaluation of large language models’ capabilities in understanding, reasoning, and generation within code-mixed (multilingual) contexts. To this end, we introduce CodeMixQA, a benchmark comprising high-quality, human-annotated parallel corpora spanning 16 geographic regions and code-mixing patterns, supporting both original scripts and transliterated forms. Through question-answering tasks, the benchmark comprehensively assesses models’ comprehension of mixed-language inputs, cross-lingual reasoning consistency, and the fluency and semantic fidelity of generated outputs. Our study is the first to systematically uncover critical limitations of current large language models in code-mixing scenarios, thereby establishing an empirical foundation and standardized evaluation framework for developing more robust multilingual models.

Technology Category

Application Category

📝 Abstract
Code-switching is a pervasive phenomenon in multilingual communication, yet the robustness of large language models (LLMs) in mixed-language settings remains insufficiently understood. In this work, we present a comprehensive evaluation of LLM capabilities in understanding, reasoning over, and generating code-switched text. We introduce CodeMixQA a novel benchmark with high-quality human annotations, comprising 16 diverse parallel code-switched language-pair variants that span multiple geographic regions and code-switching patterns, and include both original scripts and their transliterated forms. Using this benchmark, we analyze the reasoning behavior of LLMs on code-switched question-answering tasks, shedding light on how models process and reason over mixed-language inputs. We further conduct a systematic evaluation of LLM-generated synthetic code-switched text, focusing on both naturalness and semantic fidelity, and uncover key limitations in current generation capabilities. Our findings reveal persistent challenges in both reasoning and generation under code-switching conditions and provide actionable insights for building more robust multilingual LLMs. We release the dataset and code as open source.
Problem

Research questions and friction points this paper is trying to address.

code-switching
large language models
multilingual communication
mixed-language understanding
code-switched generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

code-switching
large language models
multilingual benchmark
CodeMixQA
synthetic code-switched generation
🔎 Similar Papers
No similar papers found.