🤖 AI Summary
This study investigates large language models’ (LLMs) ability to assess the naturalness of complex sentence-final forms in low-resource agglutinative languages—specifically Korean—where morphological richness and context sensitivity pose significant challenges for LLMs.
Method: We introduce KoSEnd, the first Korean dataset explicitly designed for sentence-final form naturalness evaluation, comprising 3,000 utterances drawn from diverse, authentic contexts and rigorously annotated by linguistically trained human raters. We systematically evaluate 11 LLMs across parameter scales and prediction consistency metrics.
Contribution/Results: Experiments reveal pervasive limitations in LLMs’ naturalness discrimination capability. To address this, we propose an explicit prompting strategy—“missing final-form prompting”—that incorporates morphological knowledge into inference. This linguistically grounded approach yields substantial performance gains, demonstrating that explicit modeling of agglutinative morphology is critical for effective LLM adaptation to low-resource languages. Our work establishes a new benchmark, methodology, and empirical foundation for evaluating and improving LLMs on agglutinative languages.
📝 Abstract
Although LLMs have made significant progress in various languages, there are still concerns about their effectiveness with low-resource agglutinative languages compared to languages such as English. In this study, we focused on Korean, a language known for its complex sentence endings, and evaluated LLMs on this challenging aspect. We introduce the Korean Sentence Endings (KoSEnd) dataset, which includes 3,000 sentences, each annotated for the naturalness of 15 sentence ending forms. These were collected from diverse sources to cover a range of contexts. We evaluated 11 LLMs to assess their understanding of Korean sentence endings, analyzing them based on parameter count and prediction consistency. Notably, we found that informing models about the possibility of missing sentence endings improved performance, highlighting the impact of explicitly considering certain linguistic features.