š¤ AI Summary
This work addresses the underexplored problem of large language modelsā (LLMs) ability to disambiguate ambiguous emojis in minimal contrastive textual contexts. To this end, the authors introduce EmojiDisambiguationāthe first benchmark specifically designed for evaluating context-dependent emoji disambiguationābuilt upon human-crafted triplets (ambiguous sentenceācontrastive contextāreasoning question). It adopts a linguistically inspired task framework and employs joint evaluation across diverse open- and closed-source LLMs. The key contribution is a contrastive context-pair design that systematically reveals two pervasive LLM biases: insensitivity to subtle pragmatic cues and strong preference for dominant interpretations. Experiments show even state-of-the-art models frequently fail on fine-grained contextual distinctions, highlighting a substantial gap between LLMs and human semantic reasoning capabilities. The benchmark provides a reproducible, extensible testbed for future research on emoji semantics and contextual inference.
š Abstract
Large language models (LLMs) are increasingly deployed in real-world communication settings, yet their ability to resolve context-dependent ambiguity remains underexplored. In this work, we present EMODIS, a new benchmark for evaluating LLMs'capacity to interpret ambiguous emoji expressions under minimal but contrastive textual contexts. Each instance in EMODIS comprises an ambiguous sentence containing an emoji, two distinct disambiguating contexts that lead to divergent interpretations, and a specific question that requires contextual reasoning. We evaluate both open-source and API-based LLMs, and find that even the strongest models frequently fail to distinguish meanings when only subtle contextual cues are present. Further analysis reveals systematic biases toward dominant interpretations and limited sensitivity to pragmatic contrast. EMODIS provides a rigorous testbed for assessing contextual disambiguation, and highlights the gap in semantic reasoning between humans and LLMs.