Can Large Language Models Understand, Reason About, and Generate Code-Switched Text?

📅 2026-01-12

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the lack of systematic evaluation of large language models’ capabilities in understanding, reasoning, and generation within code-mixed (multilingual) contexts. To this end, we introduce CodeMixQA, a benchmark comprising high-quality, human-annotated parallel corpora spanning 16 geographic regions and code-mixing patterns, supporting both original scripts and transliterated forms. Through question-answering tasks, the benchmark comprehensively assesses models’ comprehension of mixed-language inputs, cross-lingual reasoning consistency, and the fluency and semantic fidelity of generated outputs. Our study is the first to systematically uncover critical limitations of current large language models in code-mixing scenarios, thereby establishing an empirical foundation and standardized evaluation framework for developing more robust multilingual models.

Technology Category

Application Category

📝 Abstract

Code-switching is a pervasive phenomenon in multilingual communication, yet the robustness of large language models (LLMs) in mixed-language settings remains insufficiently understood. In this work, we present a comprehensive evaluation of LLM capabilities in understanding, reasoning over, and generating code-switched text. We introduce CodeMixQA a novel benchmark with high-quality human annotations, comprising 16 diverse parallel code-switched language-pair variants that span multiple geographic regions and code-switching patterns, and include both original scripts and their transliterated forms. Using this benchmark, we analyze the reasoning behavior of LLMs on code-switched question-answering tasks, shedding light on how models process and reason over mixed-language inputs. We further conduct a systematic evaluation of LLM-generated synthetic code-switched text, focusing on both naturalness and semantic fidelity, and uncover key limitations in current generation capabilities. Our findings reveal persistent challenges in both reasoning and generation under code-switching conditions and provide actionable insights for building more robust multilingual LLMs. We release the dataset and code as open source.

Problem

Research questions and friction points this paper is trying to address.

code-switching

large language models

multilingual communication

mixed-language understanding

code-switched generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

code-switching

large language models

multilingual benchmark