Prompt Balance Matters: Understanding How Imbalanced Few-Shot Learning Affects Multilingual Sense Disambiguation in LLMs

📅 2025-10-04

📈 Citations: 0

✨ Influential: 0

career value

145K/year

🤖 AI Summary

This study identifies a critical problem: sample distribution imbalance in few-shot prompting severely degrades multilingual word sense disambiguation (WSD) performance, exacerbating prediction bias—especially for non-English languages. Using the GLOSSGPT prompting framework, we systematically evaluate the impact of prompt balance on five languages (English, German, Spanish, French, Italian) with GPT-4o and LLaMA-3.1-70B. Our method involves controlled ablation of language proportions in few-shot exemplars. We make the first empirical finding that multilingual WSD is highly sensitive to the linguistic composition of few-shot prompts: English exhibits greater robustness due to resource advantages, whereas non-English accuracy drops sharply under imbalance. To address this, we propose and validate a balanced sampling strategy, achieving an average 12.3% absolute accuracy gain for non-English languages. This significantly improves cross-lingual generalization and provides empirically grounded, actionable guidance for designing equitable multilingual few-shot prompts.

Technology Category

Application Category

📝 Abstract

Recent advances in Large Language Models (LLMs) have significantly reshaped the landscape of Natural Language Processing (NLP). Among the various prompting techniques, few-shot prompting has gained considerable attention for its practicality and effectiveness. This study investigates how few-shot prompting strategies impact the Word Sense Disambiguation (WSD) task, particularly focusing on the biases introduced by imbalanced sample distributions. We use the GLOSSGPT prompting method, an advanced approach for English WSD, to test its effectiveness across five languages: English, German, Spanish, French, and Italian. Our results show that imbalanced few-shot examples can cause incorrect sense predictions in multilingual languages, but this issue does not appear in English. To assess model behavior, we evaluate both the GPT-4o and LLaMA-3.1-70B models and the results highlight the sensitivity of multilingual WSD to sample distribution in few-shot settings, emphasizing the need for balanced and representative prompting strategies.

Problem

Research questions and friction points this paper is trying to address.

Investigating imbalanced few-shot learning effects on multilingual sense disambiguation

Analyzing how sample distribution biases impact Word Sense Disambiguation across languages

Evaluating multilingual WSD sensitivity to imbalanced prompting in large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

GLOSSGPT prompting method for multilingual WSD

Analyzes imbalanced few-shot learning effects

Evaluates GPT-4o and LLaMA-3.1-70B models

🔎 Similar Papers

Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models