๐ค AI Summary
This study identifies a critical problem: sample distribution imbalance in few-shot prompting severely degrades multilingual word sense disambiguation (WSD) performance, exacerbating prediction biasโespecially for non-English languages. Using the GLOSSGPT prompting framework, we systematically evaluate the impact of prompt balance on five languages (English, German, Spanish, French, Italian) with GPT-4o and LLaMA-3.1-70B. Our method involves controlled ablation of language proportions in few-shot exemplars. We make the first empirical finding that multilingual WSD is highly sensitive to the linguistic composition of few-shot prompts: English exhibits greater robustness due to resource advantages, whereas non-English accuracy drops sharply under imbalance. To address this, we propose and validate a balanced sampling strategy, achieving an average 12.3% absolute accuracy gain for non-English languages. This significantly improves cross-lingual generalization and provides empirically grounded, actionable guidance for designing equitable multilingual few-shot prompts.
๐ Abstract
Recent advances in Large Language Models (LLMs) have significantly reshaped the landscape of Natural Language Processing (NLP). Among the various prompting techniques, few-shot prompting has gained considerable attention for its practicality and effectiveness. This study investigates how few-shot prompting strategies impact the Word Sense Disambiguation (WSD) task, particularly focusing on the biases introduced by imbalanced sample distributions. We use the GLOSSGPT prompting method, an advanced approach for English WSD, to test its effectiveness across five languages: English, German, Spanish, French, and Italian. Our results show that imbalanced few-shot examples can cause incorrect sense predictions in multilingual languages, but this issue does not appear in English. To assess model behavior, we evaluate both the GPT-4o and LLaMA-3.1-70B models and the results highlight the sensitivity of multilingual WSD to sample distribution in few-shot settings, emphasizing the need for balanced and representative prompting strategies.