🤖 AI Summary
Large language models (LLMs) exhibit significant performance disparities across languages in enterprise multilingual applications, with notably lower accuracy on mid- and low-resource languages compared to English—undermining reliability in customer support, content moderation, and similar critical use cases. Even within retrieval-augmented generation (RAG) systems, non-English accuracy lags behind English by up to 29%. To address this, we propose Batch-level Cross-lingual Alignment Tuning (BCAT), a supervised fine-tuning method that enforces alignment of hidden-layer representations and generated outputs across semantically equivalent multilingual samples within each training batch—without altering model architecture or inference procedures. BCAT preserves English performance, reasoning capability, and retrieval quality while boosting non-English accuracy by up to 23.9%. This significantly enhances multilingual LLM consistency, robustness, and fairness, and seamlessly integrates with existing RAG deployment pipelines.
📝 Abstract
Large language models (LLMs) remain unreliable for global enterprise applications due to substantial performance gaps between high-resource and mid/low-resource languages, driven by English-centric pretraining and internal reasoning biases. This inconsistency undermines customer experience and operational reliability in multilingual settings such as customer support, content moderation, and information retrieval. Even with advanced Retrieval-Augmented Generation (RAG) systems, we observe up to an 29% accuracy drop in non-English languages compared to English.
We propose a practical, batch-wise alignment strategy for fine-tuning LLMs, leveraging semantically equivalent multilingual data in each training batch to directly align model outputs across languages. This approach improves non-English accuracy by up to 23.9% without compromising English performance, model reasoning, or retrieval quality. Our method is simple to implement, scalable, and integrates seamlessly with existing LLM training & deployment pipelines, enabling more robust and equitable multilingual AI solutions in industry.