🤖 AI Summary
Traditional artistic font generation suffers from limited interactivity: existing methods struggle with localized editing, iterative refinement, multi-character typographic coordination, and interpretation of abstract semantic prompts. This paper proposes a training-free interactive artistic font generation framework that synergistically integrates diffusion models with large language models (LLMs). We introduce a region-aware attention mechanism and a noise fusion technique to enable precise multi-region control and continuous optimization. Additionally, the LLM parses open-ended stylistic descriptions and structurally generates executable layout instructions, supporting cross-lingual, multi-character composition. Experiments demonstrate that our method preserves high character legibility and visual creativity in both single-character and multi-character scenarios, while significantly enhancing user-driven design flexibility and fine-grained controllability.
📝 Abstract
Artistic typography aims to stylize input characters with visual effects that are both creative and legible. Traditional approaches rely heavily on manual design, while recent generative models, particularly diffusion-based methods, have enabled automated character stylization. However, existing solutions remain limited in interactivity, lacking support for localized edits, iterative refinement, multi-character composition, and open-ended prompt interpretation. We introduce WordCraft, an interactive artistic typography system that integrates diffusion models to address these limitations. WordCraft features a training-free regional attention mechanism for precise, multi-region generation and a noise blending that supports continuous refinement without compromising visual quality. To support flexible, intent-driven generation, we incorporate a large language model to parse and structure both concrete and abstract user prompts. These components allow our framework to synthesize high-quality, stylized typography across single- and multi-character inputs across multiple languages, supporting diverse user-centered workflows. Our system significantly enhances interactivity in artistic typography synthesis, opening up creative possibilities for artists and designers.