🤖 AI Summary
In artistic font generation, balancing creativity and legibility remains challenging—particularly in jointly controlling character geometry and texture-style attributes. This paper proposes a training-free dual-branch diffusion framework that decouples each character into a “body” (encoding geometric structure and legibility) and a “periphery” (encoding texture and artistic style), enabling independent, fine-grained control over both aspects. The method integrates large language model–driven knowledge guidance, semantics-aware regional decomposition, typographic structure refinement, and controllable compositional diffusion rendering. It establishes the first scene-aware character parsing paradigm, preserving high-fidelity glyph geometry while supporting multi-concept co-expression and flexible theme integration. Experiments demonstrate state-of-the-art performance across artistic quality, legibility, and semantic customizability. The implementation is publicly available.
📝 Abstract
Artistic typography is a technique to visualize the meaning of input character in an imaginable and readable manner. With powerful text-to-image diffusion models, existing methods directly design the overall geometry and texture of input character, making it challenging to ensure both creativity and legibility. In this paper, we introduce a dual-branch and training-free method, namely VitaGlyph, enabling flexible artistic typography along with controllable geometry change to maintain the readability. The key insight of VitaGlyph is to treat input character as a scene composed of Subject and Surrounding, followed by rendering them under varying degrees of geometry transformation. The subject flexibly expresses the essential concept of input character, while the surrounding enriches relevant background without altering the shape. Specifically, we implement VitaGlyph through a three-phase framework: (i) Knowledge Acquisition leverages large language models to design text descriptions of subject and surrounding. (ii) Regional decomposition detects the part that most matches the subject description and divides input glyph image into subject and surrounding regions. (iii) Typography Stylization firstly refines the structure of subject region via Semantic Typography, and then separately renders the textures of Subject and Surrounding regions through Controllable Compositional Generation. Experimental results demonstrate that VitaGlyph not only achieves better artistry and readability, but also manages to depict multiple customize concepts, facilitating more creative and pleasing artistic typography generation. Our code will be made publicly at https://github.com/Carlofkl/VitaGlyph.