🤖 AI Summary
Existing open-source large language models (LLMs) exhibit suboptimal Korean language capabilities without compromising English performance.
Method: We propose Llama-3-Motif—a 102B-parameter model built upon the Llama 3 architecture—that integrates LlamaPro’s structured expansion with Masked Structure Growth (MSG), enabling efficient, scalable parameter growth without altering the core Transformer architecture. Training was conducted on the MoAI ultra-large-scale GPU cluster, with fine-grained bilingual data ratio tuning to balance English and Korean supervision.
Contribution/Results: Llama-3-Motif achieves state-of-the-art Korean performance on major Korean benchmarks—surpassing all prior open-source models and approaching GPT-4—while retaining SOTA English capabilities. This work represents the first demonstration of high-fidelity, balanced bilingual (English–Korean) enhancement in a 100B-scale open-source LLM, establishing a novel paradigm for lightweight, efficient multilingual model scaling.
📝 Abstract
We introduce Llama-3-Motif, a language model consisting of 102 billion parameters, specifically designed to enhance Korean capabilities while retaining strong performance in English. Developed on the Llama 3 architecture, Llama-3-Motif employs advanced training techniques, including LlamaPro and Masked Structure Growth, to effectively scale the model without altering its core Transformer architecture. Using the MoAI platform for efficient training across hyperscale GPU clusters, we optimized Llama-3-Motif using a carefully curated dataset that maintains a balanced ratio of Korean and English data. Llama-3-Motif shows decent performance on Korean-specific benchmarks, outperforming existing models and achieving results comparable to GPT-4.