Source-Grounded Semantic Reinforcement Learning for Low-Resource Target-Language Generation

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of low-resource target language generation, which is hindered by the scarcity of parallel corpora and limited access to abundant monolingual data in the source language. The authors propose SG-SRL, a novel framework that leverages source-language monolingual data to construct cross-lingual semantic supervision signals. Specifically, a reference-free semantic reward model is built via a cross-lingual reranker to guide reinforcement learning for improved semantic alignment in generation. A lightweight recovery phase is further introduced, fine-tuning the model on a small parallel corpus to preserve output format fidelity. Evaluated on Chinese-to-Thai translation, the approach significantly enhances semantic accuracy and factual coverage. Moreover, experiments in extremely low-resource settings, such as Tibetan, demonstrate that encoder-based models can effectively replace large language models as efficient reward estimators.
📝 Abstract
Low-resource target-language generation is often limited by scarce parallel data, while high-resource source-language monolingual data is abundant but difficult to use with standard supervised fine-tuning. We propose Source-Grounded Semantic Reinforcement Learning (SG-SRL), a resource-utilization framework that converts source-language monolingual data into cross-lingual semantic supervision for target-language generation. SG-SRL performs reference-free reinforcement learning (RL) on source-language data using a cross-lingual semantic reward model, instantiated by a cross-lingual reranker that scores the semantic relevance between the source input and the target-language generation. While this induces severe verbosity-based reward hacking, a lightweight recovery stage using a small parallel corpus restores fluency, conciseness, and task format while preserving the semantic gains. Experiments on Chinese-to-Thai generation show that SG-SRL improves semantic grounding and factual coverage over cold-start SFT. Additional analyses on long-form transfer and Tibetan embedding-based rewards clarify the generalization behavior of SG-SRL and show that an encoder-based semantic reward can substitute for an LLM-based reranker in a realistic low-resource language setting.
Problem

Research questions and friction points this paper is trying to address.

low-resource target-language generation
parallel data scarcity
monolingual data utilization
cross-lingual semantic supervision
semantic grounding
Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic reinforcement learning
cross-lingual transfer
low-resource generation
reward hacking mitigation
monolingual data utilization
🔎 Similar Papers