🤖 AI Summary
Low-resource language LLM training faces bottlenecks in high computational cost and poor scalability. To address this, we propose a lightweight adaptation paradigm for Persian: leveraging the Phi-3 Mini architecture, we design a synergistic training framework comprising (i) bilingual Tiny Stories “warm-up” embedding alignment, (ii) curriculum-driven monolingual-to-multilingual capability transfer, and (iii) parameter-efficient fine-tuning (PEFT) integrated with progressive instruction tuning. This approach substantially reduces GPU memory consumption and computational overhead while achieving competitive performance—matching that of significantly larger models—on the Hugging Face Open Persian LLM Leaderboard. All models and code are publicly released, enabling reproducible, low-barrier adoption. Our work provides a scalable, resource-efficient technical pathway toward equitable AI access for low-resource languages.
📝 Abstract
The democratization of AI is currently hindered by the immense computational costs required to train Large Language Models (LLMs) for low-resource languages. This paper presents Persian-Phi, a 3.8B parameter model that challenges the assumption that robust multilingual capabilities require massive model sizes or multilingual baselines. We demonstrate how Microsoft Phi-3 Mini -- originally a monolingual English model -- can be effectively adapted to Persian through a novel, resource-efficient curriculum learning pipeline. Our approach employs a unique "warm-up" stage using bilingual narratives (Tiny Stories) to align embeddings prior to heavy training, followed by continual pretraining and instruction tuning via Parameter-Efficient Fine-Tuning (PEFT). Despite its compact size, Persian-Phi achieves competitive results on Open Persian LLM Leaderboard in HuggingFace. Our findings provide a validated, scalable framework for extending the reach of state-of-the-art LLMs to underrepresented languages with minimal hardware resources. The Persian-Phi model is publicly available at https://huggingface.co/amirakhlaghiqqq/PersianPhi.