🤖 AI Summary
The Korean financial large language model (LLM) ecosystem suffers from a lack of systematic evaluation benchmarks and high-quality, open-source resources.
Method: We propose MCQA, a multi-task evaluation framework integrating domain-specific data cleaning, synthetic data generation, and instruction fine-tuning to establish a transparent, reproducible training paradigm.
Contribution/Results: We introduce the first fully open-source Korean financial NLP benchmark—comprising five types of financial multiple-choice questions and open-ended QA tasks—alongside an 80K-sample high-quality financial instruction dataset. We also release Won, a production-ready, commercially licensable, and reproducible domain-specialized LLM. Empirical analysis of 1,119 model submissions on our leaderboard identifies effective training strategies. Won achieves state-of-the-art performance across multiple Korean financial NLP tasks, filling a critical gap in the open Korean financial LLM landscape.
📝 Abstract
In this work, we present the first open leaderboard for evaluating Korean large language models focused on finance. Operated for about eight weeks, the leaderboard evaluated 1,119 submissions on a closed benchmark covering five MCQA categories: finance and accounting, stock price prediction, domestic company analysis, financial markets, and financial agent tasks and one open-ended qa task. Building on insights from these evaluations, we release an open instruction dataset of 80k instances and summarize widely used training strategies observed among top-performing models. Finally, we introduce Won, a fully open and transparent LLM built using these best practices. We hope our contributions help advance the development of better and safer financial LLMs for Korean and other languages.