🤖 AI Summary
This work addresses the limitations of current large language models in handling Korean, which stem from low-quality training data and a lack of cultural alignment, hindering their ability to capture Korea-specific values, commonsense knowledge, and nuanced emotional expressions. To overcome these challenges, we propose Mi:dm 2.0—the first bilingual large language model systematically integrating Korean sociocultural commonsense and reasoning patterns. Through high-quality data curation, synthetic data generation, a curriculum learning–guided data mixing strategy, and a Korean-optimized tokenizer, Mi:dm 2.0 achieves deep contextual understanding of local nuances. Released under the MIT License in both general and lightweight variants, the model attains state-of-the-art zero-shot performance on Korean benchmarks such as KMMLU, significantly outperforming existing models and advancing the development of the K-intelligence ecosystem.
📝 Abstract
We introduce Mi:dm 2.0, a bilingual large language model (LLM) specifically engineered to advance Korea-centric AI. This model goes beyond Korean text processing by integrating the values, reasoning patterns, and commonsense knowledge inherent to Korean society, enabling nuanced understanding of cultural contexts, emotional subtleties, and real-world scenarios to generate reliable and culturally appropriate responses. To address limitations of existing LLMs, often caused by insufficient or low-quality Korean data and lack of cultural alignment, Mi:dm 2.0 emphasizes robust data quality through a comprehensive pipeline that includes proprietary data cleansing, high-quality synthetic data generation, strategic data mixing with curriculum learning, and a custom Korean-optimized tokenizer to improve efficiency and coverage. To realize this vision, we offer two complementary configurations: Mi:dm 2.0 Base (11.5B parameters), built with a depth-up scaling strategy for general-purpose use, and Mi:dm 2.0 Mini (2.3B parameters), optimized for resource-constrained environments and specialized tasks. Mi:dm 2.0 achieves state-of-the-art performance on Korean-specific benchmarks, with top-tier zero-shot results on KMMLU and strong internal evaluation results across language, humanities, and social science tasks. The Mi:dm 2.0 lineup is released under the MIT license to support extensive research and commercial use. By offering accessible and high-performance Korea-centric LLMs, KT aims to accelerate AI adoption across Korean industries, public services, and education, strengthen the Korean AI developer community, and lay the groundwork for the broader vision of K-intelligence. Our models are available at https://huggingface.co/K-intelligence. For technical inquiries, please contact midm-llm@kt.com.