🤖 AI Summary
This study addresses the lack of reliable source language selection methods for cross-lingual transfer in low-resource African languages. Through a systematic evaluation of five embedding similarity metrics—cosine distance, P@1, CSLS, CKA, and others—across 816 cross-lingual transfer experiments spanning 12 African languages, three NLP tasks, and three Africa-centric multilingual models, the work demonstrates that cosine distance and retrieval-based metrics (P@1, CSLS) effectively predict transfer performance (Spearman’s ρ = 0.4–0.6), matching the predictive power of URIEL typological features. In contrast, CKA exhibits negligible predictive ability (ρ ≈ 0.1). The paper further presents the first direct comparison between embedding-based metrics and linguistic typology, uncovering a Simpson’s paradox when aggregating results across models, thereby underscoring the necessity of validating metric efficacy separately for each model.
📝 Abstract
Cross-lingual transfer is essential for building NLP systems for low-resource African languages, but practitioners lack reliable methods for selecting source languages. We systematically evaluate five embedding similarity metrics across 816 transfer experiments spanning three NLP tasks, three African-centric multilingual models, and 12 languages from four language families. We find that cosine gap and retrieval-based metrics (P@1, CSLS) reliably predict transfer success ($\rho = 0.4-0.6$), while CKA shows negligible predictive power ($\rho \approx 0.1$). Critically, correlation signs reverse when pooling across models (Simpson's Paradox), so practitioners must validate per-model. Embedding metrics achieve comparable predictive power to URIEL linguistic typology. Our results provide concrete guidance for source language selection and highlight the importance of model-specific analysis.