🤖 AI Summary
This work addresses the challenge that large language models often rely on literal or phonetic translations for cross-cultural entity names, failing to produce culturally appropriate renderings in the target context. To overcome this limitation, the authors propose the EA-RLVR framework, which uniquely integrates verifiable entity-level rewards with reinforcement learning. Leveraging a lightweight structural gating mechanism, EA-RLVR activates parametric knowledge encoded during pretraining without requiring external knowledge bases. This approach encourages the model to learn robust reasoning rather than merely imitating reference translations. Evaluated on 50k unseen entities, the method boosts translation accuracy of Qwen3-14B from 23.66% to 31.87% using only 7k training samples and achieves a +1.59 XCOMET gain on WMT24++, demonstrating strong out-of-domain generalization capabilities.
📝 Abstract
Cross-cultural entity translation remains challenging for large language models (LLMs) as literal or phonetic renderings are usually yielded instead of culturally appropriate translations in context. However, relevant knowledge may already be encoded in model parameters during large-scale pre-training. To incentivize the effective use of parametric knowledge, we propose EA-RLVR (Entity-Anchored Reinforcement Learning with Verifiable Rewards), a training framework that optimizes cross-cultural entity translation without relying on external knowledge bases. EA-RLVR anchors supervision on a verifiable, entity-level reward signal and incorporates lightweight structural gates to stabilize optimization. This design steers the model toward learning a robust reasoning process rather than merely imitating reference translations. We evaluate EA-RLVR on XC-Translate and observe consistent improvements in both entity translation accuracy and out-of-domain generalization. Specifically, training on merely 7k samples boosts Qwen3-14B's entity translation accuracy from 23.66\% to 31.87\% on a 50k test set comprising entirely unseen entities. The learned entity translation ability also transfers to general translation, yielding +1.35 XCOMET on WMT24++, which scales to +1.59 with extended optimization. Extensive analyses of $pass@k$ dynamics and reward formulations attribute these gains to superior sampling efficiency and a stable optimization landscape.