🤖 AI Summary
Low-resource neural machine translation (NMT) suffers from scarce in-context examples, often relying on costly manual annotation or external resources. Method: This paper proposes DAT, an unsupervised in-context example generation framework leveraging large language models (LLMs). DAT employs prompt-based candidate sentence-pair generation and jointly optimizes selection via relevance and diversity criteria to identify high-quality parallel sentences. Contribution/Results: DAT is the first method to construct in-context examples without any external resources or human annotation, enabling online, incremental accumulation and reuse of demonstration examples at inference time. Evaluated on multiple low-resource language pairs, DAT-generated examples yield BLEU improvements of 2.1–3.8 points over strong baselines, demonstrating the effectiveness and practicality of unsupervised, dynamic in-context example generation for low-resource NMT.
📝 Abstract
Large language models (LLMs) have demonstrated strong performance across various tasks, leveraging their exceptional in-context learning ability with only a few examples. Accordingly, the selection of optimal in-context examples has been actively studied in the field of machine translation. However, these studies presuppose the presence of a demonstration pool with human-annotated pairs, making them less applicable to low-resource languages where such an assumption is challenging to meet. To overcome this limitation, this paper explores the research direction of in-context example generation for machine translation. Specifically, we propose Demonstration Augmentation for Translation (DAT), a simple yet effective approach that generates example pairs without relying on any external resources. This method builds upon two prior criteria, relevance and diversity, which have been highlighted in previous work as key factors for in-context example selection. Through experiments and analysis on low-resource languages where human-annotated pairs are scarce, we show that DAT achieves superior translation quality compared to the baselines. Furthermore, we investigate the potential of progressively accumulating generated pairs during test time to build and reuse a demonstration pool. Our implementation is publicly available at https://github.com/aiclaudev/DAT.