CycleDistill: Bootstrapping Machine Translation using LLMs with Cyclical Distillation

πŸ“… 2025-06-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the performance bottleneck in machine translation (MT) for low-resource languages caused by scarce parallel corpora, this paper proposes CycleDistillβ€”a few-shot MT framework leveraging large language models (LLMs) and cyclic knowledge distillation. CycleDistill requires only 1–4 demonstration examples; it iteratively generates high-quality synthetic parallel data from monolingual corpora using an LLM and refines the MT model through multiple distillation rounds. Crucially, it introduces softmax activation in the distillation objective to enhance training stability and translation quality. Experiments across three Indian languages demonstrate that CycleDistill consistently improves baseline models by 20–30 chrF on average, with substantial gains achieved as early as the first iteration. Results confirm its effectiveness, robustness, and cross-lingual scalability. This work establishes a novel, efficient, lightweight, and annotation-free paradigm for zero- or few-shot MT in low-resource settings.

Technology Category

Application Category

πŸ“ Abstract
Large language models (LLMs), despite their ability to perform few-shot machine translation (MT), often lag behind dedicated MT systems trained on parallel corpora, which are crucial for high quality machine translation (MT). However, parallel corpora are often scarce or non-existent for low-resource languages. In this paper, we propose CycleDistill, a bootstrapping approach leveraging LLMs and few-shot translation to obtain high-quality MT systems. CycleDistill involves iteratively generating synthetic parallel corpora from monolingual corpora via zero- or few-shot MT, which is then used to fine-tune the model that was used for generating said data for MT. CycleDistill does not need parallel corpora beyond 1 to 4 few-shot examples, and in our experiments focusing on three Indian languages, by relying solely on monolingual corpora, it can achieve high-quality machine translation, improving upon a few-shot baseline model by over 20-30 chrF points on average in the first iteration. We also study the effect of leveraging softmax activations during the distillation process and observe mild improvements in translation quality.
Problem

Research questions and friction points this paper is trying to address.

Improving machine translation without parallel corpora
Leveraging LLMs for low-resource language translation
Cyclical distillation to boost translation quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cyclical distillation boosts MT without parallel corpora
LLMs generate synthetic data for iterative fine-tuning
Softmax activations mildly enhance translation quality
πŸ”Ž Similar Papers
No similar papers found.