Un-Doubling Diffusion: LLM-guided Disambiguation of Homonym Duplication

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Homographs—words with identical spelling but distinct semantics—induce “homograph duplication” in text-to-image diffusion models, causing semantic confusion and generating semantically incoherent images. More critically, English-centric prompting pipelines require non-English prompts to undergo translation preprocessing, often erroneously mapping unambiguous source-language terms to English homographs, thereby introducing translation-induced semantic distortion. To address this, we propose an LLM-guided, context-aware prompt expansion method that enables fine-grained semantic disambiguation. We introduce the Homograph Repetition Rate (HRR), a novel quantitative metric, to systematically characterize and mitigate translation-triggered semantic duplication for the first time. Our evaluation integrates vision-language model (VLM)-based automated assessment with human validation, achieving significant HRR reduction and markedly improving both semantic fidelity and cross-lingual consistency of generated images.

Technology Category

Application Category

📝 Abstract
Homonyms are words with identical spelling but distinct meanings, which pose challenges for many generative models. When a homonym appears in a prompt, diffusion models may generate multiple senses of the word simultaneously, which is known as homonym duplication. This issue is further complicated by an Anglocentric bias, which includes an additional translation step before the text-to-image model pipeline. As a result, even words that are not homonymous in the original language may become homonyms and lose their meaning after translation into English. In this paper, we introduce a method for measuring duplication rates and conduct evaluations of different diffusion models using both automatic evaluation utilizing Vision-Language Models (VLM) and human evaluation. Additionally, we investigate methods to mitigate the homonym duplication problem through prompt expansion, demonstrating that this approach also effectively reduces duplication related to Anglocentric bias. The code for the automatic evaluation pipeline is publicly available.
Problem

Research questions and friction points this paper is trying to address.

Homonyms cause diffusion models to generate multiple meanings simultaneously
Anglocentric bias creates artificial homonyms through translation issues
Measuring and reducing homonym duplication in text-to-image generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-guided disambiguation of homonym duplication
Measuring duplication rates with automatic VLM evaluation
Mitigating duplication through prompt expansion techniques
E
Evgeny Kaskov
SberAI, Moscow, Russia
Elizaveta Petrova
Elizaveta Petrova
SberDevices
Computer Vision
P
Petr Surovtsev
SberAI, Moscow, Russia
A
Anna Kostikova
SberAI, Moscow, Russia
I
Ilya Mistiurin
SberAI, Moscow, Russia
Alexander Kapitanov
Alexander Kapitanov
SberDevices
Computer Vision
Alexander Nagaev
Alexander Nagaev
SberDevices