SemiAdapt and SemiLoRA: Efficient Domain Adaptation for Transformer-based Low-Resource Language Translation with a Case Study on Irish

πŸ“… 2025-10-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the high computational cost of domain adaptation and the scarcity of high-quality labeled data for low-resource languages (e.g., Irish), this paper proposes SemiAdapt and SemiLoRAβ€”two semi-supervised, parameter-efficient fine-tuning methods. SemiLoRA is the first to integrate semi-supervised learning with Low-Rank Adaptation (LoRA) within the Transformer architecture, enabling computationally efficient inference and lightweight domain adaptation. It incorporates embedding-layer enhancement and noise-robust design to significantly improve utilization of large-scale weakly labeled or noisy data. Experiments on Irish machine translation demonstrate that SemiAdapt outperforms full-parameter fine-tuning, while SemiLoRA matches or exceeds its performance despite reducing trainable parameters by over 90%. All models are publicly released.

Technology Category

Application Category

πŸ“ Abstract
Fine-tuning is widely used to tailor large language models for specific tasks such as neural machine translation (NMT). However, leveraging transfer learning is computationally expensive when fine-tuning large multilingual models with billions of parameters, thus creating a barrier to entry for researchers working on low-resource domains such as Irish translation. Parameter-efficient fine-tuning (PEFT) bridges this gap by training on a fraction of the original model parameters, with the Low-Rank Adaptation (LoRA) approach introducing small, trainable adapter layers. We introduce SemiAdapt and SemiLoRA as semi-supervised inference-efficient approaches that strengthen domain adaptation and lead to improved overall performance in NMT. We demonstrate that SemiAdapt can outperform full-domain fine-tuning, while most notably, SemiLoRA can propel PEFT methods to match or even outperform full-model fine-tuning. We further evaluate domain-by-dataset fine-tuning and demonstrate that our embedding-based inference methods perform especially well on larger and noisier corpora. All Irish translation models developed in this work are released as open resources. These methods aim to make high-quality domain adaptation and fine-tuning more accessible to researchers working with low-resource languages.
Problem

Research questions and friction points this paper is trying to address.

Efficient domain adaptation for low-resource language translation
Parameter-efficient fine-tuning for computationally expensive models
Improving Irish translation with semi-supervised inference methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

SemiAdapt and SemiLoRA enable efficient domain adaptation
SemiLoRA matches full-model fine-tuning performance
Embedding-based inference excels on noisy corpora
πŸ”Ž Similar Papers
No similar papers found.
J
Josh McGiff
Department of Computer Science and Information Systems, University of Limerick, Ireland
Nikola S. Nikolov
Nikola S. Nikolov
Associate Professor, Department of Computer Science and Information Systems, University of Limerick
Machine LearningNLPGraph Drawing