๐ค AI Summary
This work proposes a visual contextโdriven framework for artistic font generation that overcomes limitations in style diversity and fine-grained control inherent in existing methods. By treating element images as visual contexts, the approach leverages an image inpainting model to transfer styles at the pixel level into glyph regions. A lightweight Context-aware Mask Adapter is introduced to inject shape information, enhancing structural control. Notably, the method is the first to distinguish between object-like and amorphous elements and incorporates a training-free attention redirection mechanism for region-aware style modulation. Additionally, edge redrawing is employed to improve boundary naturalness. Evaluated under a zero-shot setting, the framework achieves superior fidelity in both structure and texture, enables flexible style mixing, and demonstrates state-of-the-art performance on the ElementFont dataset.
๐ Abstract
Artistic font generation aims to synthesize stylized glyphs based on a reference style. However, existing approaches suffer from limited style diversity and coarse control. In this work, we explore the potential of element-driven artistic font generation. Elements are the fundamental visual units of a font, serving as reference images for the desired style. Conceptually, we categorize elements into object elements (e.g., flowers or stones) with distinct structures and amorphous elements (e.g., flames or clouds) with unstructured textures. We introduce FontCrafter, an element-driven framework for font creation, and construct a large-scale dataset, ElementFont, which contains diverse element types and high-quality glyph images. However, achieving high-fidelity reconstruction of both texture and structure of reference elements remains challenging. To address this, we propose an in-context generation strategy that treats element images as visual context and uses an inpainting model to transfer element styles into glyph regions at the pixel level. To further control glyph shapes, we design a lightweight Context-aware Mask Adapter (CMA) that injects shape information. Moreover, a training-free attention redirection mechanism enables region-aware style control and suppresses stroke hallucination. In addition, edge repainting is applied to make boundaries more natural. Extensive experiments demonstrate that FontCrafter achieves strong zero-shot generation performance, particularly in preserving structural and textural fidelity, while also supporting flexible controls such as style mixture.