🤖 AI Summary
Multilingual large language models frequently exhibit language mixing—generating responses in unintended languages—particularly in low-resource language settings, severely compromising linguistic consistency and user experience. To address this, we first identify an intrinsic deficiency in pretrained models: their inability to discriminate between monolingual and code-mixed text. We then propose an explicit language-mixing penalty mechanism built upon the ORPO framework, requiring no additional annotated data. Our approach incorporates temperature-robust decoding design and cross-lingual generation loss modeling to enable lightweight, efficient supervised fine-tuning. Experimental results demonstrate that our method significantly reduces language mixing rates, maintains high language accuracy even under high-temperature decoding, and preserves both generation quality and multilingual capability without degradation.
📝 Abstract
Large language models often suffer from language confusion, a phenomenon where responses are partially or entirely generated in unintended languages. This can critically impact user experience in low-resource settings. We hypothesize that conventional supervised fine-tuning exacerbates this issue because the softmax objective focuses probability mass only on the single correct token but does not explicitly penalize cross-lingual mixing. Interestingly, by observing loss trajectories during the pretraining phase, we observe that models fail to learn to distinguish between monolingual and language-confused text. Additionally, we find that ORPO, which adds penalties for unwanted output styles to standard SFT, effectively suppresses language-confused generations even at high decoding temperatures without degrading overall model performance. Our findings suggest that incorporating appropriate penalty terms can mitigate language confusion in low-resource settings with limited data.