KFinEval-Pilot: A Comprehensive Benchmark Suite for Korean Financial Language Understanding

📅 2025-04-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language models (LLMs) exhibit critical deficiencies in Korean financial contexts—including insufficient domain-specific knowledge, weak legal reasoning capabilities, and poor detection of financial toxicity—while lacking a systematic, domain-specific evaluation benchmark. Method: We introduce KoFinBench, the first multidimensional evaluation benchmark for Korean financial AI, covering three high-stakes tasks: financial knowledge QA, legal clause reasoning, and financial toxicity identification. We propose a hybrid data construction paradigm combining GPT-4–assisted semi-automatic generation with rigorous domain-expert validation. Contribution/Results: Our empirical evaluation across 10+ LLMs reveals a previously undocumented family-level trade-off between accuracy and safety in financial language understanding. KoFinBench enables reproducible, interpretable early-stage assessment, effectively diagnosing model weaknesses and advancing trustworthy Korean financial AI development.

Technology Category

Application Category

📝 Abstract
We introduce KFinEval-Pilot, a benchmark suite specifically designed to evaluate large language models (LLMs) in the Korean financial domain. Addressing the limitations of existing English-centric benchmarks, KFinEval-Pilot comprises over 1,000 curated questions across three critical areas: financial knowledge, legal reasoning, and financial toxicity. The benchmark is constructed through a semi-automated pipeline that combines GPT-4-generated prompts with expert validation to ensure domain relevance and factual accuracy. We evaluate a range of representative LLMs and observe notable performance differences across models, with trade-offs between task accuracy and output safety across different model families. These results highlight persistent challenges in applying LLMs to high-stakes financial applications, particularly in reasoning and safety. Grounded in real-world financial use cases and aligned with the Korean regulatory and linguistic context, KFinEval-Pilot serves as an early diagnostic tool for developing safer and more reliable financial AI systems.
Problem

Research questions and friction points this paper is trying to address.

Evaluates LLMs for Korean financial language understanding
Addresses lack of Korean financial domain benchmarks
Assesses model performance in reasoning and safety
Innovation

Methods, ideas, or system contributions that make the work stand out.

Korean financial benchmark for LLMs
Semi-automated pipeline with expert validation
Evaluates financial knowledge and safety
🔎 Similar Papers
No similar papers found.
B
Bokwang Hwang
Korea Financial Telecommunications and Clearings Institute
S
Seonkyu Lim
Korea Financial Telecommunications and Clearings Institute
T
Taewoong Kim
Korea Financial Telecommunications and Clearings Institute
Y
Yongjae Geun
Korea Financial Telecommunications and Clearings Institute
S
Sunghyun Bang
Korea Financial Telecommunications and Clearings Institute
S
Sohyun Park
Korea Financial Telecommunications and Clearings Institute
J
Jihyun Park
Korea Financial Telecommunications and Clearings Institute
M
Myeonggyu Lee
Korea Financial Telecommunications and Clearings Institute
J
Jinwoo Lee
Korea Financial Telecommunications and Clearings Institute
Y
Yerin Kim
Korea Financial Telecommunications and Clearings Institute
J
Jinsun Yoo
Korea Financial Telecommunications and Clearings Institute
J
Jingyeong Hong
Korea Financial Telecommunications and Clearings Institute
Jina Park
Jina Park
University of Southern California
Low Power Design
Y
Yongchan Kim
Korea Financial Telecommunications and Clearings Institute
Suhyun Kim
Suhyun Kim
Kyung Hee University
Artificial IntelligenceData ScienceCompilers
Younggyun Hahm
Younggyun Hahm
Teddysum
natural language processing
Y
Yiseul Lee
Teddysum Inc.
Y
Yejee Kang
Teddysum Inc.
C
Chanhyuk Yoon
Teddysum Inc.
C
Chansu Lee
SELECTSTAR Inc.
H
Heeyewon Jeong
SELECTSTAR Inc.
Jiyeon Lee
Jiyeon Lee
SELECTSTAR Inc.
S
Seonhye Gu
SELECTSTAR Inc.
Hyebin Kang
Hyebin Kang
SELECTSTAR Inc.
Y
Yousang Cho
Konyang University
H
Hangyeol Yoo
Seoultech
KyungTae Lim
KyungTae Lim
École normale supérieure
Natural Language Processing