SemiAdapt and SemiLoRA: Efficient Domain Adaptation for Transformer-based Low-Resource Language Translation with a Case Study on Irish

πŸ“… 2025-10-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

196K/year
πŸ€– AI Summary
To address the high computational cost of domain adaptation and the scarcity of high-quality labeled data for low-resource languages (e.g., Irish), this paper proposes SemiAdapt and SemiLoRAβ€”two semi-supervised, parameter-efficient fine-tuning methods. SemiLoRA is the first to integrate semi-supervised learning with Low-Rank Adaptation (LoRA) within the Transformer architecture, enabling computationally efficient inference and lightweight domain adaptation. It incorporates embedding-layer enhancement and noise-robust design to significantly improve utilization of large-scale weakly labeled or noisy data. Experiments on Irish machine translation demonstrate that SemiAdapt outperforms full-parameter fine-tuning, while SemiLoRA matches or exceeds its performance despite reducing trainable parameters by over 90%. All models are publicly released.

Technology Category

Application Category

πŸ“ Abstract
Fine-tuning is widely used to tailor large language models for specific tasks such as neural machine translation (NMT). However, leveraging transfer learning is computationally expensive when fine-tuning large multilingual models with billions of parameters, thus creating a barrier to entry for researchers working on low-resource domains such as Irish translation. Parameter-efficient fine-tuning (PEFT) bridges this gap by training on a fraction of the original model parameters, with the Low-Rank Adaptation (LoRA) approach introducing small, trainable adapter layers. We introduce SemiAdapt and SemiLoRA as semi-supervised inference-efficient approaches that strengthen domain adaptation and lead to improved overall performance in NMT. We demonstrate that SemiAdapt can outperform full-domain fine-tuning, while most notably, SemiLoRA can propel PEFT methods to match or even outperform full-model fine-tuning. We further evaluate domain-by-dataset fine-tuning and demonstrate that our embedding-based inference methods perform especially well on larger and noisier corpora. All Irish translation models developed in this work are released as open resources. These methods aim to make high-quality domain adaptation and fine-tuning more accessible to researchers working with low-resource languages.
Problem

Research questions and friction points this paper is trying to address.

Efficient domain adaptation for low-resource language translation
Parameter-efficient fine-tuning for computationally expensive models
Improving Irish translation with semi-supervised inference methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

SemiAdapt and SemiLoRA enable efficient domain adaptation
SemiLoRA matches full-model fine-tuning performance
Embedding-based inference excels on noisy corpora
πŸ”Ž Similar Papers
No similar papers found.