🤖 AI Summary
This study addresses the challenge of machine translation for low-resource languages, where the scarcity of high-quality parallel data hinders effective adaptation of large language models (LLMs). For the first time, it scales in-context learning (ICL) from few-shot settings to contexts containing up to a thousand examples, leveraging LLMs with million-token context windows. The authors systematically evaluate the injection of diverse corpora—including monolingual, instruction-based, and parallel data—during inference. Experiments on Javanese and Sundanese reveal that translation quality exhibits nonlinear dependence on the number of in-context examples, with performance gains rapidly saturating or even degrading beyond certain thresholds, heavily influenced by corpus type. Notably, specific forms of monolingual supervision can match the effectiveness of parallel data, highlighting both the promise and limitations of long-context ICL for low-resource translation.
📝 Abstract
Building machine translation (MT) systems for low-resource languages is notably difficult due to the scarcity of high-quality data. Although Large Language Models (LLMs) have improved MT system performance, adapting them to lesser-represented languages remains challenging. In-context learning (ICL) may offer novel ways to adapt LLMs for low-resource MT by conditioning models on demonstration at inference time. In this study, we explore scaling low-resource machine translation ICL beyond the few-shot setting to thousands of examples with long-context models. We scale in-context token budget to 1M tokens and compare three types of training corpora used as in-context supervision: monolingual unsupervised data, instruction-style data, and parallel data (English--target and Indonesian--target). Our experiments on Javanese and Sundanese show that gains from additional context saturate quickly and can degrade near the maximum context window, with scaling behavior strongly dependent on corpus type. Notably, some forms of monolingual supervision can be competitive with parallel data, despite the latter offering additional supervision. Overall, our results characterize the effective limits and corpus-type sensitivity of long-context ICL for low-resource MT, highlighting that larger context windows do not necessarily yield proportional quality gains.