Evaluating Large language models on Understanding Korean indirect Speech acts

📅 2025-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates the pragmatic understanding capabilities of mainstream large language models (LLMs) regarding Korean indirect speech acts—focusing on authentic dialogues where literal meaning diverges from intended meaning. We propose the first Korean-specific, dual-paradigm evaluation framework comprising multiple-choice questions (MCQ) and open-ended questions (OEQ), integrating contextualized prompting and human-annotated ground-truth benchmarks. Experimental results reveal a substantial pragmatic gap: even the strongest model, Claude3-Opus, achieves only 71.94% accuracy on MCQ and 65.0% on OEQ—significantly below human performance. All models struggle to detect indirectness, with performance deteriorating markedly as indirectness intensity increases. This work provides the first quantitative characterization of LLMs’ limitations in Korean indirect speech comprehension, establishing a novel benchmark and analytical lens for assessing pragmatic competence in multilingual LLMs.

Technology Category

Application Category

📝 Abstract
To accurately understand the intention of an utterance is crucial in conversational communication. As conversational artificial intelligence models are rapidly being developed and applied in various fields, it is important to evaluate the LLMs' capabilities of understanding the intentions of user's utterance. This study evaluates whether current LLMs can understand the intention of an utterance by considering the given conversational context, particularly in cases where the actual intention differs from the surface-leveled, literal intention of the sentence, i.e. indirect speech acts. Our findings reveal that Claude3-Opus outperformed the other competing models, with 71.94% in MCQ and 65% in OEQ, showing a clear advantage. In general, proprietary models exhibited relatively higher performance compared to open-source models. Nevertheless, no LLMs reached the level of human performance. Most LLMs, except for Claude3-Opus, demonstrated significantly lower performance in understanding indirect speech acts compared to direct speech acts, where the intention is explicitly revealed through the utterance. This study not only performs an overall pragmatic evaluation of each LLM's language use through the analysis of OEQ response patterns, but also emphasizes the necessity for further research to improve LLMs' understanding of indirect speech acts for more natural communication with humans.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs on Korean indirect speech acts
Assessing intention understanding in conversational AI
Comparing human and LLM performance in communication
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates LLMs on Korean indirect speech acts
Claude3-Opus outperforms in MCQ and OEQ
Proprietary models surpass open-source in performance
🔎 Similar Papers
No similar papers found.
Y
Youngeun Koo
Department of German Language and Literature, Sungkyunkwan University, Seoul, South Korea
Jiwoo Lee
Jiwoo Lee
Staff Scientist of Lawrence Livermore National Laboratory
ClimateClimate modelingdiagnostic metricsbig-data visualizationnumerical weather prediction
D
Dojun Park
Artificial Intelligence Institute of Seoul National University(AIIS), Seoul, South Korea
Seohyun Park
Seohyun Park
Korea University
S
Sungeun Lee
Artificial Intelligence Institute of Seoul National University(AIIS), Seoul, South Korea; Department of German Language and Literature, Seoul National University, Seoul, South Korea; Brain Humanities Lab, Seoul National University, Seoul, South Korea