LogoDiffuser: Training-Free Multilingual Logo Generation and Stylization via Letter-Aware Attention Control

📅 2026-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of generating multilingual logos with high fidelity and visual appeal, a task where existing text-to-image methods often distort character structures and struggle to generalize to unseen languages. The authors propose a training-free, character-aware generative framework that treats target characters as image inputs and integrates their geometric structure with visual style through a multimodal diffusion Transformer. Key innovations include joint attention analysis, cross-layer attention map aggregation, and core token injection, enabling precise fusion of typographic form and design aesthetics. This approach achieves, for the first time, high-quality logo generation across arbitrary languages without additional training. Extensive user studies and quantitative evaluations demonstrate its superior performance in both character fidelity and design quality compared to state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
Recent advances in text-to-image generation have been remarkable, but generating multilingual design logos that harmoniously integrate visual and textual elements remains a challenging task. Existing methods often distort character geometry when applying creative styles and struggle to support multilingual text generation without additional training. To address these challenges, we propose LogoDiffuser, a training-free method that synthesizes multilingual logo designs using the multimodal diffusion transformer. Instead of using textual prompts, we input the target characters as images, enabling robust character structure control regardless of language. We first analyze the joint attention mechanism to identify core tokens, which are tokens that strongly respond to textual structures. With this observation, our method integrates character structure and visual design by injecting the most informative attention maps. Furthermore, we perform layer-wise aggregation of attention maps to mitigate attention shifts across layers and obtain consistent core tokens. Extensive experiments and user studies demonstrate that our method achieves state-of-the-art performance in multilingual logo generation.
Problem

Research questions and friction points this paper is trying to address.

multilingual logo generation
character geometry distortion
text-to-image generation
training-free stylization
Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free
multilingual logo generation
letter-aware attention control
diffusion transformer
attention map aggregation
🔎 Similar Papers
No similar papers found.
Mingyu Kang
Mingyu Kang
UC Berkeley
quantum physicsquantum computing
H
Hyein Seo
Department of Computer Science, Hanyang University
Y
Yuna Jeong
Department of Artificial Intelligence, Hanyang University
J
Junhyeong Park
Department of Artificial Intelligence, Hanyang University
Y
Yong Suk Choi
Department of Computer Science, Hanyang University