Mi:dm 2.0 Korea-centric Bilingual Language Models

📅 2026-01-14
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of current large language models in handling Korean, which stem from low-quality training data and a lack of cultural alignment, hindering their ability to capture Korea-specific values, commonsense knowledge, and nuanced emotional expressions. To overcome these challenges, we propose Mi:dm 2.0—the first bilingual large language model systematically integrating Korean sociocultural commonsense and reasoning patterns. Through high-quality data curation, synthetic data generation, a curriculum learning–guided data mixing strategy, and a Korean-optimized tokenizer, Mi:dm 2.0 achieves deep contextual understanding of local nuances. Released under the MIT License in both general and lightweight variants, the model attains state-of-the-art zero-shot performance on Korean benchmarks such as KMMLU, significantly outperforming existing models and advancing the development of the K-intelligence ecosystem.

Technology Category

Application Category

📝 Abstract
We introduce Mi:dm 2.0, a bilingual large language model (LLM) specifically engineered to advance Korea-centric AI. This model goes beyond Korean text processing by integrating the values, reasoning patterns, and commonsense knowledge inherent to Korean society, enabling nuanced understanding of cultural contexts, emotional subtleties, and real-world scenarios to generate reliable and culturally appropriate responses. To address limitations of existing LLMs, often caused by insufficient or low-quality Korean data and lack of cultural alignment, Mi:dm 2.0 emphasizes robust data quality through a comprehensive pipeline that includes proprietary data cleansing, high-quality synthetic data generation, strategic data mixing with curriculum learning, and a custom Korean-optimized tokenizer to improve efficiency and coverage. To realize this vision, we offer two complementary configurations: Mi:dm 2.0 Base (11.5B parameters), built with a depth-up scaling strategy for general-purpose use, and Mi:dm 2.0 Mini (2.3B parameters), optimized for resource-constrained environments and specialized tasks. Mi:dm 2.0 achieves state-of-the-art performance on Korean-specific benchmarks, with top-tier zero-shot results on KMMLU and strong internal evaluation results across language, humanities, and social science tasks. The Mi:dm 2.0 lineup is released under the MIT license to support extensive research and commercial use. By offering accessible and high-performance Korea-centric LLMs, KT aims to accelerate AI adoption across Korean industries, public services, and education, strengthen the Korean AI developer community, and lay the groundwork for the broader vision of K-intelligence. Our models are available at https://huggingface.co/K-intelligence. For technical inquiries, please contact midm-llm@kt.com.
Problem

Research questions and friction points this paper is trying to address.

Korean-centric AI
bilingual language models
cultural alignment
low-quality Korean data
commonsense knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

Korea-centric LLM
cultural alignment
high-quality synthetic data
Korean-optimized tokenizer
curriculum learning
🔎 Similar Papers
No similar papers found.
D
Donghoon Shin
Tech. Innovation Group, KT
S
Sejung Lee
Tech. Innovation Group, KT
Soonmin Bae
Soonmin Bae
NAVER Clova
Artificial IntelligenceComputer VisionComputer Graphics
H
Hwijung Ryu
Tech. Innovation Group, KT
C
Changwon Ok
Tech. Innovation Group, KT
Hoyoun Jung
Hoyoun Jung
School of Integrated Technology, Gwangju Institute of Science and Technology
Reinforcement LearningNatural Language Processing
H
Hyesung Ji
Tech. Innovation Group, KT
J
Jeehyun Lim
Tech. Innovation Group, KT
J
Jehoon Lee
Tech. Innovation Group, KT
J
Ji-Eun Han
Tech. Innovation Group, KT
J
Jisoo Baik
Tech. Innovation Group, KT
M
Mihyeong Kim
Tech. Innovation Group, KT
R
Riwoo Chung
Tech. Innovation Group, KT
S
Seongmin Lee
Tech. Innovation Group, KT
W
Wonjae Park
Tech. Innovation Group, KT
Yoonseok Heo
Yoonseok Heo
Sogang University
Natural Language ProcessingNeural Machine TranslationMultimodal Neural Machine TranslationNLG
Y
Youngkyung Seo
Tech. Innovation Group, KT
S
Seyoun Won
Tech. Innovation Group, KT
B
Boeun Kim
Tech. Innovation Group, KT
C
Cheolhun Heo
Tech. Innovation Group, KT
E
Eunkyeong Lee
Tech. Innovation Group, KT
H
Honghee Lee
Tech. Innovation Group, KT
H
Hyeongju Ju
Tech. Innovation Group, KT
H
Hyeontae Seo
Tech. Innovation Group, KT
J
Jeongyong Shim
Tech. Innovation Group, KT
Jisoo Lee
Jisoo Lee
Indiana University
Human-AI collaborationCybersecurity
J
Jun-Seok Koh
Tech. Innovation Group, KT
J
Junwoo Kim
Tech. Innovation Group, KT
M
Minho Lee
Tech. Innovation Group, KT
M
Minji Kang
Tech. Innovation Group, KT
M
Minju Kim
Tech. Innovation Group, KT
Sangha Nam
Sangha Nam
KAIST
NLPSemantic WebDeep LearningKnowledge Representation
S
S. Park
Tech. Innovation Group, KT
T
Taehyeong Kim
Tech. Innovation Group, KT
E
Euijai Ahn
Tech. Innovation Group, KT
H
Hong Seok Jeung
Tech. Innovation Group, KT
J
Jisu Shin
Tech. Innovation Group, KT
Jiyeon Kim
Jiyeon Kim
Assistant Professor at Yale University
Cancer BiologyCancer metabolismLung cancer
S
Seonyeong Song
Tech. Innovation Group, KT
S
Seung Hyun Kong
Tech. Innovation Group, KT
S
Sukjin Hong
Tech. Innovation Group, KT
T
Taeyang Yun
Tech. Innovation Group, KT
Y
Yu-Seon Kim
Tech. Innovation Group, KT
A
A-Hyun Lee
Tech. Innovation Group, KT
C
Chae-Jeong Lee
Tech. Innovation Group, KT
H
Hye-Won Yu
Tech. Innovation Group, KT
J
Ji-Hyun Ahn
Tech. Innovation Group, KT
Songseong Kim
Songseong Kim
Unknown affiliation
S
Sun-Woo Jung
Tech. Innovation Group, KT
E
Eunju Kim
Tech. Innovation Group, KT
E
Eunji Ha
Tech. Innovation Group, KT
J
J. Baek
Tech. Innovation Group, KT
Y
YunGyoo Lee
Tech. Innovation Group, KT
W
Wanjin Park
Tech. Innovation Group, KT
J
Jeong Yeop Kim
Tech. Innovation Group, KT
E
Eun Mi Kim
Tech. Innovation Group, KT
H
Hyoungjun Park
Tech. Innovation Group, KT
Jungwon Yoon
Jungwon Yoon
Professor, Gwangju Institute of Science and Technology
Rehabilitation roboticsMagnetic Particle ImagingNano Robotic NavigationExoskeletonVR-based Automation
M
Minsung Noh
Tech. Innovation Group, KT
M
Myunggyo Oh
Tech. Innovation Group, KT
W
W. Lee
Tech. Innovation Group, KT
Y
Yunjin Park
Tech. Innovation Group, KT
Y
Young S. Kwon
Tech. Innovation Group, KT
H
Hyun Keun Kim
Tech. Innovation Group, KT
J
Jieun Lee
Tech. Innovation Group, KT
Y
Ye Eun Park
Tech. Innovation Group, KT