🤖 AI Summary
This study investigates the mechanistic role of cross-lingual adapters in low-resource Creole machine translation. Addressing the key hypothesis that adapter-based knowledge transfer depends on linguistic relatedness (e.g., genealogical distance), we systematically evaluate adapter souping, cross-attention fine-tuning, and integration with mBART. Results show that adapters primarily function as parameter regularizers—not as encoders of linguistic knowledge: randomly initialized adapters perform comparably to those initialized from typologically or genealogically related languages, directly challenging the prevailing “linguistic relatedness-driven transfer” assumption. Our approach achieves significant improvements over strong baselines across three low-resource Creoles, with performance independent of genealogical distance. The core contribution is the empirical identification of regularization—not linguistic knowledge encoding—as the dominant mechanism underlying adapter efficacy in low-resource cross-lingual MT, providing a novel theoretical foundation for adapter design in resource-constrained multilingual settings.
📝 Abstract
Cross-lingual transfer from related high-resource languages is a well-established strategy to enhance low-resource language technologies. Prior work has shown that adapters show promise for, e.g., improving low-resource machine translation (MT). In this work, we investigate an adapter souping method combined with cross-attention fine-tuning of a pre-trained MT model to leverage language transfer for three low-resource Creole languages, which exhibit relatedness to different language groups across distinct linguistic dimensions. Our approach improves performance substantially over baselines. However, we find that linguistic relatedness -- or even a lack thereof -- does not covary meaningfully with adapter performance. Surprisingly, our cross-attention fine-tuning approach appears equally effective with randomly initialized adapters, implying that the benefit of adapters in this setting lies in parameter regularization, and not in meaningful information transfer. We provide analysis supporting this regularization hypothesis. Our findings underscore the reality that neural language processing involves many success factors, and that not all neural methods leverage linguistic knowledge in intuitive ways.