Prompt Balance Matters: Understanding How Imbalanced Few-Shot Learning Affects Multilingual Sense Disambiguation in LLMs

๐Ÿ“… 2025-10-04
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study identifies a critical problem: sample distribution imbalance in few-shot prompting severely degrades multilingual word sense disambiguation (WSD) performance, exacerbating prediction biasโ€”especially for non-English languages. Using the GLOSSGPT prompting framework, we systematically evaluate the impact of prompt balance on five languages (English, German, Spanish, French, Italian) with GPT-4o and LLaMA-3.1-70B. Our method involves controlled ablation of language proportions in few-shot exemplars. We make the first empirical finding that multilingual WSD is highly sensitive to the linguistic composition of few-shot prompts: English exhibits greater robustness due to resource advantages, whereas non-English accuracy drops sharply under imbalance. To address this, we propose and validate a balanced sampling strategy, achieving an average 12.3% absolute accuracy gain for non-English languages. This significantly improves cross-lingual generalization and provides empirically grounded, actionable guidance for designing equitable multilingual few-shot prompts.

Technology Category

Application Category

๐Ÿ“ Abstract
Recent advances in Large Language Models (LLMs) have significantly reshaped the landscape of Natural Language Processing (NLP). Among the various prompting techniques, few-shot prompting has gained considerable attention for its practicality and effectiveness. This study investigates how few-shot prompting strategies impact the Word Sense Disambiguation (WSD) task, particularly focusing on the biases introduced by imbalanced sample distributions. We use the GLOSSGPT prompting method, an advanced approach for English WSD, to test its effectiveness across five languages: English, German, Spanish, French, and Italian. Our results show that imbalanced few-shot examples can cause incorrect sense predictions in multilingual languages, but this issue does not appear in English. To assess model behavior, we evaluate both the GPT-4o and LLaMA-3.1-70B models and the results highlight the sensitivity of multilingual WSD to sample distribution in few-shot settings, emphasizing the need for balanced and representative prompting strategies.
Problem

Research questions and friction points this paper is trying to address.

Investigating imbalanced few-shot learning effects on multilingual sense disambiguation
Analyzing how sample distribution biases impact Word Sense Disambiguation across languages
Evaluating multilingual WSD sensitivity to imbalanced prompting in large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

GLOSSGPT prompting method for multilingual WSD
Analyzes imbalanced few-shot learning effects
Evaluates GPT-4o and LLaMA-3.1-70B models
๐Ÿ”Ž Similar Papers
No similar papers found.