Language Confusion Gate: Language-Aware Decoding Through Model Self-Distillation

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

Large language models (LLMs) frequently generate harmful language mixing—unintended, semantically incoherent interjections of non-dominant languages—which existing mitigation strategies fail to address robustly: fine-tuning-based approaches require costly retraining, while detection methods struggle to distinguish harmful mixing from legitimate code-switching. Method: We propose a lightweight, plug-and-play, decoding-time language-aware filtering mechanism that operates without modifying the base model. Our approach models language preference bias via token embedding norm disparities and employs norm-regularized self-distillation to train a gating module for precise identification and selective masking of harmful mixing. It further enables dynamic language-family identification and fine-grained decoding control. Contribution/Results: Evaluated across Qwen3, GPT-OSS, Gemma3, and Llama3.1, our method reduces language mixing rates by an order of magnitude on average, with no degradation in downstream task performance.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) often experience language confusion, which is the unintended mixing of languages during text generation. Current solutions to this problem either necessitate model retraining or cannot differentiate between harmful confusion and acceptable code-switching. This paper introduces the Language Confusion Gate (LCG), a lightweight, plug-in solution that filters tokens during decoding without altering the base LLM. The LCG is trained using norm-adjusted self-distillation to predict appropriate language families and apply masking only when needed. Our method is based on the findings that language confusion is infrequent, correct-language tokens are usually among the top predictions, and output token embedding norms are larger for high-resource languages, which biases sampling. When evaluated across various models, including Qwen3, GPT-OSS, Gemma3, Llama3.1, LCG decreases language confusion significantly, often by an order of magnitude, without negatively impacting task performance. Code is available at https://github.com/collinzrj/language_confusion_gate.

Problem

Research questions and friction points this paper is trying to address.

Prevents unintended language mixing during LLM text generation

Differentiates harmful confusion from acceptable code-switching behavior

Filters tokens during decoding without requiring model retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight plug-in filter for token decoding

Self-distillation training for language family prediction

Norm-adjusted masking prevents unintended language mixing

🔎 Similar Papers

No similar papers found.