An Exploration-Analysis-Disambiguation Reasoning Framework for Word Sense Disambiguation with Low-Parameter LLMs

📅 2026-03-05

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This work addresses the challenge of word sense disambiguation (WSD) in low-parameter large language models, which often struggle to accurately identify rare or domain-specific word senses. To this end, the authors propose an Explore-Analyze-Disambiguate reasoning framework that integrates chain-of-thought reasoning with contextual neighbor analysis. Leveraging semi-automatically curated, rationale-rich annotations, they perform reasoning-centered fine-tuning on sub-4B parameter models such as Gemma and Qwen. The resulting approach achieves near GPT-4-Turbo zero-shot performance—the first such result reported for low-parameter models—and surpasses all medium-parameter baselines and prior state-of-the-art systems on the FEWS dataset. Furthermore, it demonstrates exceptional generalization capabilities on unseen senses and in cross-domain settings, as evidenced by strong performance on the “Fool Me If You Can” benchmark.

Technology Category

Application Category

📝 Abstract

Word Sense Disambiguation (WSD) remains a key challenge in Natural Language Processing (NLP), especially when dealing with rare or domain-specific senses that are often misinterpreted. While modern high-parameter Large Language Models (LLMs) such as GPT-4-Turbo have shown state-of-the-art WSD performance, their computational and energy demands limit scalability. This study investigates whether low-parameter LLMs (<4B parameters) can achieve comparable results through fine-tuning strategies that emphasize reasoning-driven sense identification. Using the FEWS dataset augmented with semi-automated, rationale-rich annotations, we fine-tune eight small-scale open-source LLMs (e.g. Gemma and Qwen). Our results reveal that Chain-of-Thought (CoT)-based reasoning combined with neighbour-word analysis achieves performance comparable to GPT-4-Turbo in zero-shot settings. Importantly, Gemma-3-4B and Qwen-3-4B models consistently outperform all medium-parameter baselines and state-of-the-art models on FEWS, with robust generalization to unseen senses. Furthermore, evaluation on the unseen"Fool Me If You Can''dataset confirms strong cross-domain adaptability without task-specific fine-tuning. This work demonstrates that with carefully crafted reasoning-centric fine-tuning, low-parameter LLMs can deliver accurate WSD while substantially reducing computational and energy demands.

Problem

Research questions and friction points this paper is trying to address.

Word Sense Disambiguation

Low-Parameter LLMs

Computational Efficiency

Rare Senses

Domain-Specific Senses

Innovation

Methods, ideas, or system contributions that make the work stand out.

Word Sense Disambiguation

Low-parameter LLMs

Chain-of-Thought Reasoning