🤖 AI Summary
This study addresses the scarcity of multilingual parallel corpora in crisis scenarios, which severely limits the quality of emergency translation. To overcome this challenge, the authors propose a novel approach that integrates domain adaptation with readability optimization. Specifically, they first retrieve and filter data from general-domain corpora to augment limited crisis-related parallel data, which is then used to fine-tune a small language model. Subsequently, preference-based reinforcement learning is employed to generate simplified English translations conforming to the CEFR A2 readability level. This work represents the first effort to jointly incorporate domain adaptation and simplification objectives tailored for emergency communication. Both automatic and human evaluations demonstrate that the resulting translations significantly improve readability while preserving semantic adequacy, confirming the method’s effectiveness and practicality in resource-constrained settings.
📝 Abstract
Timely and reliable multilingual communication is critical during natural and human-induced disasters, but developing effective solutions for crisis communication is limited by the scarcity of curated parallel data. We propose a domain-adaptive pipeline that expands a small reference corpus, by retrieving and filtering data from general corpora. We use the resulting dataset to fine-tune a small language model for crisis-domain translation and then apply preference optimization to bias outputs toward CEFR A2-level English. Automatic and human evaluation shows that this approach improves readability, while maintaining strong adequacy. Our results indicate that simplified English, combined with domain adaptation, can function as a practical lingua franca for emergency communication when full multilingual coverage is not feasible.