SwanNLP at SemEval-2026 Task 5: An LLM-based Framework for Plausibility Scoring in Narrative Word Sense Disambiguation

📅 2026-04-17

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This work addresses the limited word sense disambiguation capabilities of large language models in narrative contexts by proposing an evaluation framework that integrates structured reasoning mechanisms to assess the human-perceived plausibility of polysemous word senses within story settings. The approach combines dynamic few-shot prompting, low-rank parameter-efficient fine-tuning, and multi-model ensembling to effectively simulate the consensus judgments of multiple human annotators. Experimental results demonstrate that the framework accurately replicates human ratings of word sense plausibility with high fidelity, and that the model ensembling strategy further enhances both performance and inter-annotator consistency, significantly outperforming existing baselines.

Technology Category

Application Category

📝 Abstract

Recent advances in language models have substantially improved Natural Language Understanding (NLU). Although widely used benchmarks suggest that Large Language Models (LLMs) can effectively disambiguate, their practical applicability in real-world narrative contexts remains underexplored. SemEval-2026 Task 5 addresses this gap by introducing a task that predicts the human-perceived plausibility of a word sense within a short story. In this work, we propose an LLM-based framework for plausibility scoring of homonymous word senses in narrative texts using a structured reasoning mechanism. We examine the impact of fine-tuning low-parameter LLMs with diverse reasoning strategies, alongside dynamic few-shot prompting for large-parameter models, on accurate sense identification and plausibility estimation. Our results show that commercial large-parameter LLMs with dynamic few-shot prompting closely replicate human-like plausibility judgments. Furthermore, model ensembling slightly improves performance, better simulating the agreement patterns of five human annotators compared to single-model predictions

Problem

Research questions and friction points this paper is trying to address.

plausibility scoring

narrative word sense disambiguation

homonymous word senses

human-perceived plausibility

natural language understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

plausibility scoring

narrative word sense disambiguation

large language models