SoS: Analysis of Surface over Semantics in Multilingual Text-To-Image Generation

📅 2026-01-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses a critical yet previously unquantified issue in multilingual text-to-image (T2I) generation: the tendency of models to prioritize surface-level linguistic forms over deeper semantic meaning, leading to culturally stereotyped outputs. The authors formally define and name this phenomenon “Surface-over-Semantics” (SoS), introducing a novel SoS metric grounded in a comprehensive multilingual prompt set spanning 14 languages and 171 cultural identities. Through multilingual prompt engineering, cross-cultural semantic alignment, and layer-wise analysis of text encoders, they demonstrate that SoS bias accumulates with encoder depth and strongly correlates with visual stereotyping. Evaluations across seven mainstream T2I models reveal significant SoS bias in all but one model, each exhibiting the effect in at least two languages.

Technology Category

Application Category

📝 Abstract
Text-to-image (T2I) models are increasingly employed by users worldwide. However, prior research has pointed to the high sensitivity of T2I towards particular input languages - when faced with languages other than English (i.e., different surface forms of the same prompt), T2I models often produce culturally stereotypical depictions, prioritizing the surface over the prompt's semantics. Yet a comprehensive analysis of this behavior, which we dub Surface-over-Semantics (SoS), is missing. We present the first analysis of T2I models'SoS tendencies. To this end, we create a set of prompts covering 171 cultural identities, translated into 14 languages, and use it to prompt seven T2I models. To quantify SoS tendencies across models, languages, and cultures, we introduce a novel measure and analyze how the tendencies we identify manifest visually. We show that all but one model exhibit strong surface-level tendency in at least two languages, with this effect intensifying across the layers of T2I text encoders. Moreover, these surface tendencies frequently correlate with stereotypical visual depictions.
Problem

Research questions and friction points this paper is trying to address.

Text-to-Image Generation
Multilingual
Surface-over-Semantics
Cultural Stereotypes
Language Bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

Surface-over-Semantics
multilingual text-to-image generation
cultural stereotyping
prompt translation
text encoder analysis
🔎 Similar Papers
No similar papers found.