🤖 AI Summary
This study investigates the underexplored phenomenon of asymmetric transfer effects across tasks and languages in multilingual large language models (LLMs). We conduct single-source task-language fine-tuning via PEFT/LoRA on diverse open-source LLMs—spanning varying scales and architectures—and systematically evaluate zero-shot transfer performance across all target task-language combinations. Our findings reveal: (1) robust positive cross-lingual transfer within the same task, but frequent and substantial performance degradation in cross-task transfer; (2) a stable donor–recipient hierarchy among language-task pairs; and (3) based on these insights, we propose a risk-aware fine-tuning strategy and a model specialization pathway. Quantitative analysis disentangles three distinct transfer mechanisms—task-driven, language-driven, and joint task-language transfer—providing both theoretical grounding and practical methodology for joint multilingual and multitask optimization.
📝 Abstract
Large language models (LLMs) perform strongly across tasks and languages, yet how improvements in one task or language affect other tasks and languages and their combinations remains poorly understood. We conduct a controlled PEFT/LoRA study across multiple open-weight LLM families and sizes, treating task and language as transfer axes while conditioning on model family and size; we fine-tune each model on a single task-language source and measure transfer as the percentage-point change versus its baseline score when evaluated on all other task-language target pairs. We decompose transfer into (i) Matched-Task (Cross-Language), (ii) Matched-Language (Cross-Task), and (iii) Cross-Task (Cross-Language) regimes. We uncover two consistent general patterns. First, a pronounced on-task vs. off-task asymmetry: Matched-Task (Cross-Language) transfer is reliably positive, whereas off-task transfer often incurs collateral degradation. Second, a stable donor-recipient structure across languages and tasks (hub donors vs. brittle recipients). We outline implications for risk-aware fine-tuning and model specialisation.