🤖 AI Summary
This study investigates large language models’ (LLMs) ability to discern fine-grained semantic functions of the English discourse particle “just”—including exclusivity, temporal modification, and emphasis. Using the first formal semantics–driven, polysemy-annotated dataset curated by linguistics experts, we conduct systematic evaluation via zero-shot and few-shot prompting, complemented by human validation and statistical analysis. Results show that while LLMs can distinguish broad semantic categories, they exhibit inconsistent performance on subtle discourse-level distinctions, achieving significantly lower accuracy than human annotators. The primary contribution is the establishment of the first high-precision, fine-grained benchmark for discourse particle interpretation, which reveals structural limitations in LLMs’ discourse-level semantic reasoning. This work advances methodology for evaluating model semantics by integrating formal linguistic theory with empirical model assessment, offering both a novel evaluation framework and concrete evidence of current LLMs’ discourse comprehension gaps.
📝 Abstract
Discourse particles are crucial elements that subtly shape the meaning of text. These words, often polyfunctional, give rise to nuanced and often quite disparate semantic/discourse effects, as exemplified by the diverse uses of the particle"just"(e.g., exclusive, temporal, emphatic). This work investigates the capacity of LLMs to distinguish the fine-grained senses of English"just", a well-studied example in formal semantics, using data meticulously created and labeled by expert linguists. Our findings reveal that while LLMs exhibit some ability to differentiate between broader categories, they struggle to fully capture more subtle nuances, highlighting a gap in their understanding of discourse particles.