🤖 AI Summary
Current text-to-image diffusion models struggle to efficiently generate high-fidelity typographic images for unseen fonts, typically requiring tens of minutes of fine-tuning—rendering them impractical for real-time customization. To address this, we propose a two-stage curriculum learning framework: (1) disentangling glyph shape and style features, and (2) losslessly injecting the learned style into natural backgrounds. Our method enables zero-shot inference in seconds using only a single reference glyph image. It integrates a glyph-specific feature encoder, a background-aware style injection module, and synthetic data augmentation. The approach supports diverse applications including font editing, cross-lingual transfer, and multi-style composition. Evaluated on unseen fonts, it achieves second-level, high-fidelity text rendering—outperforming state-of-the-art methods significantly in both quality and speed. The framework demonstrates strong generalization and practical utility for on-demand typography generation.
📝 Abstract
Text-to-image diffusion models have significantly improved the seamless integration of visual text into diverse image contexts. Recent approaches further improve control over font styles through fine-tuning with predefined font dictionaries. However, adapting unseen fonts outside the preset is computationally expensive, often requiring tens of minutes, making real-time customization impractical. In this paper, we present FontAdapter, a framework that enables visual text generation in unseen fonts within seconds, conditioned on a reference glyph image. To this end, we find that direct training on font datasets fails to capture nuanced font attributes, limiting generalization to new glyphs. To overcome this, we propose a two-stage curriculum learning approach: FontAdapter first learns to extract font attributes from isolated glyphs and then integrates these styles into diverse natural backgrounds. To support this two-stage training scheme, we construct synthetic datasets tailored to each stage, leveraging large-scale online fonts effectively. Experiments demonstrate that FontAdapter enables high-quality, robust font customization across unseen fonts without additional fine-tuning during inference. Furthermore, it supports visual text editing, font style blending, and cross-lingual font transfer, positioning FontAdapter as a versatile framework for font customization tasks.