🤖 AI Summary
To address the scarcity of labeled target-language data in transfer learning for low-resource languages (e.g., Korean), this paper proposes a cost-effective, high-efficiency knowledge transfer method leveraging Statistical Machine Translation (SMT) phrase alignment data (PAD). We first systematically uncover the synergistic relationship between PAD and Korean syntactic structure; then design a multi-stage data augmentation paradigm that complementarily fuses PAD with conventional supervised data; and finally introduce a syntax-aware evaluation strategy. Our approach integrates SMT alignment, transfer learning frameworks, and lightweight fusion mechanisms—requiring no additional human annotation. Experiments across multiple Korean NLP tasks demonstrate an average accuracy improvement of 4.2% over strong baselines trained on equivalent amounts of labeled data, while reducing data construction costs by approximately 60%. This work establishes a scalable, cost-efficient pathway for cross-lingual transfer in low-resource settings.
📝 Abstract
Transfer learning leverages the abundance of English data to address the scarcity of resources in modeling non-English languages, such as Korean. In this study, we explore the potential of Phrase Aligned Data (PAD) from standardized Statistical Machine Translation (SMT) to enhance the efficiency of transfer learning. Through extensive experiments, we demonstrate that PAD synergizes effectively with the syntactic characteristics of the Korean language, mitigating the weaknesses of SMT and significantly improving model performance. Moreover, we reveal that PAD complements traditional data construction methods and enhances their effectiveness when combined. This innovative approach not only boosts model performance but also suggests a cost-efficient solution for resource-scarce languages.