๐ค AI Summary
This work addresses the challenge of machine translation from Coptic to English in extremely low-resource settings by proposing a novel approach that integrates in-context learning, bilingual dictionary retrieval, and Universal Dependencies syntactic information. For the first time, syntactic structures are incorporated into the in-context learning framework through natural language descriptions and explicit instructions, jointly guiding the model to handle complex sentence constructions. The proposed method consistently yields significant improvements in translation quality across models of varying scales, establishing a new state-of-the-art for CopticโEnglish translation. These results demonstrate the effectiveness and generalizability of syntax-aware in-context learning in scenarios with severely limited parallel data.
๐ Abstract
Low-resource machine translation requires methods that differ from those used for high-resource languages. This paper proposes a novel in-context learning approach to support low-resource machine translation of the Coptic language to English, with syntactic augmentation from Universal Dependencies parses of input sentences. Building on existing work using bilingual dictionaries to support inference for vocabulary items, we add several representations of syntactic analyses to our inputs , specifically exploring the inclusion of raw parser outputs, verbalizations of parses in plain English, and targeted instructions of difficult constructions identified in sub-trees and how they can be translated. Our results show that while syntactic information alone is not as useful as dictionary-based glosses, combining retrieved dictionary items with syntactic information achieves significant gains across model sizes, achieving new state-of-the-art translation results for Coptic.