🤖 AI Summary
This study addresses how AI systems may perpetuate and amplify societal biases when processing skin-tone-modified emojis, thereby undermining fairness and inclusivity in online identity expression. It presents the first large-scale comparative analysis of specialized emoji embedding models (e.g., emoji2vec, emoji-sw2v) and mainstream large language models (e.g., Llama, Gemma, Qwen, Mistral) with respect to skin-tone emoji representations. Through systematic evaluation across semantic consistency, representational similarity, sentiment polarity, and core bias dimensions, the research reveals that specialized models exhibit severe semantic distortions, while large language models—despite supporting skin-tone modifiers—still harbor systemic biases, manifesting as inconsistent sentiment associations and semantic misalignments across skin tones. These findings expose structural flaws in foundational models and underscore the urgent need for platforms to implement rigorous algorithmic auditing and mitigation mechanisms.
📝 Abstract
Skin-toned emojis are crucial for fostering personal identity and social inclusion in online communication. As AI models, particularly Large Language Models (LLMs), increasingly mediate interactions on web platforms, the risk that these systems perpetuate societal biases through their representation of such symbols is a significant concern. This paper presents the first large-scale comparative study of bias in skin-toned emoji representations across two distinct model classes. We systematically evaluate dedicated emoji embedding models (emoji2vec, emoji-sw2v) against four modern LLMs (Llama, Gemma, Qwen, and Mistral). Our analysis first reveals a critical performance gap: while LLMs demonstrate robust support for skin tone modifiers, widely-used specialized emoji models exhibit severe deficiencies. More importantly, a multi-faceted investigation into semantic consistency, representational similarity, sentiment polarity, and core biases uncovers systemic disparities. We find evidence of skewed sentiment and inconsistent meanings associated with emojis across different skin tones, highlighting latent biases within these foundational models. Our findings underscore the urgent need for developers and platforms to audit and mitigate these representational harms, ensuring that AI's role on the web promotes genuine equity rather than reinforcing societal biases.