🤖 AI Summary
Large language models (LLMs) frequently generate harmful language mixing—unintended, semantically incoherent interjections of non-dominant languages—which existing mitigation strategies fail to address robustly: fine-tuning-based approaches require costly retraining, while detection methods struggle to distinguish harmful mixing from legitimate code-switching.
Method: We propose a lightweight, plug-and-play, decoding-time language-aware filtering mechanism that operates without modifying the base model. Our approach models language preference bias via token embedding norm disparities and employs norm-regularized self-distillation to train a gating module for precise identification and selective masking of harmful mixing. It further enables dynamic language-family identification and fine-grained decoding control.
Contribution/Results: Evaluated across Qwen3, GPT-OSS, Gemma3, and Llama3.1, our method reduces language mixing rates by an order of magnitude on average, with no degradation in downstream task performance.
📝 Abstract
Large language models (LLMs) often experience language confusion, which is the unintended mixing of languages during text generation. Current solutions to this problem either necessitate model retraining or cannot differentiate between harmful confusion and acceptable code-switching. This paper introduces the Language Confusion Gate (LCG), a lightweight, plug-in solution that filters tokens during decoding without altering the base LLM. The LCG is trained using norm-adjusted self-distillation to predict appropriate language families and apply masking only when needed. Our method is based on the findings that language confusion is infrequent, correct-language tokens are usually among the top predictions, and output token embedding norms are larger for high-resource languages, which biases sampling. When evaluated across various models, including Qwen3, GPT-OSS, Gemma3, Llama3.1, LCG decreases language confusion significantly, often by an order of magnitude, without negatively impacting task performance. Code is available at https://github.com/collinzrj/language_confusion_gate.