Making Sense of Korean Sentences: A Comprehensive Evaluation of LLMs through KoSEnd Dataset

📅 2025-07-04

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This study investigates large language models’ (LLMs) ability to assess the naturalness of complex sentence-final forms in low-resource agglutinative languages—specifically Korean—where morphological richness and context sensitivity pose significant challenges for LLMs. Method: We introduce KoSEnd, the first Korean dataset explicitly designed for sentence-final form naturalness evaluation, comprising 3,000 utterances drawn from diverse, authentic contexts and rigorously annotated by linguistically trained human raters. We systematically evaluate 11 LLMs across parameter scales and prediction consistency metrics. Contribution/Results: Experiments reveal pervasive limitations in LLMs’ naturalness discrimination capability. To address this, we propose an explicit prompting strategy—“missing final-form prompting”—that incorporates morphological knowledge into inference. This linguistically grounded approach yields substantial performance gains, demonstrating that explicit modeling of agglutinative morphology is critical for effective LLM adaptation to low-resource languages. Our work establishes a new benchmark, methodology, and empirical foundation for evaluating and improving LLMs on agglutinative languages.

Technology Category

Application Category

📝 Abstract

Although LLMs have made significant progress in various languages, there are still concerns about their effectiveness with low-resource agglutinative languages compared to languages such as English. In this study, we focused on Korean, a language known for its complex sentence endings, and evaluated LLMs on this challenging aspect. We introduce the Korean Sentence Endings (KoSEnd) dataset, which includes 3,000 sentences, each annotated for the naturalness of 15 sentence ending forms. These were collected from diverse sources to cover a range of contexts. We evaluated 11 LLMs to assess their understanding of Korean sentence endings, analyzing them based on parameter count and prediction consistency. Notably, we found that informing models about the possibility of missing sentence endings improved performance, highlighting the impact of explicitly considering certain linguistic features.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs on Korean sentence endings complexity

Assessing LLM performance on low-resource agglutinative languages

Improving model accuracy with linguistic feature awareness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing KoSEnd dataset for Korean evaluation

Evaluating 11 LLMs on sentence endings

Improving performance by informing missing endings

🔎 Similar Papers

No similar papers found.