Text-Conditioned Diffusion Model for High-Fidelity Korean Font Generation

📅 2025-04-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing GAN/VAE-based approaches for Korean font generation—especially for handwritten and printed styles—suffer from training instability, mode collapse, loss of fine-grained details, and poor generalization to unseen characters. To address these challenges, this paper proposes the first diffusion-based single-shot Korean font generation framework. Our key contributions are: (1) a novel phoneme-level text encoder that enables accurate semantic modeling of out-of-vocabulary Korean characters; (2) a coupled architecture integrating a pretrained DG-FONT style encoder with LPIPS-based perceptual loss to ensure both global style consistency and local stroke fidelity; and (3) a progressive denoising mechanism enabling high-fidelity generation of over 2,000 Hangul characters from only one reference image. Extensive experiments demonstrate that our method significantly outperforms GAN/VAE baselines in structural accuracy, texture detail preservation, and cross-character style consistency, while supporting practical, multi-style, one-click font generation in real-world scenarios.

Technology Category

Application Category

📝 Abstract
Automatic font generation (AFG) is the process of creating a new font using only a few examples of the style images. Generating fonts for complex languages like Korean and Chinese, particularly in handwritten styles, presents significant challenges. Traditional AFGs, like Generative adversarial networks (GANs) and Variational Auto-Encoders (VAEs), are usually unstable during training and often face mode collapse problems. They also struggle to capture fine details within font images. To address these problems, we present a diffusion-based AFG method which generates high-quality, diverse Korean font images using only a single reference image, focusing on handwritten and printed styles. Our approach refines noisy images incrementally, ensuring stable training and visually appealing results. A key innovation is our text encoder, which processes phonetic representations to generate accurate and contextually correct characters, even for unseen characters. We used a pre-trained style encoder from DG FONT to effectively and accurately encode the style images. To further enhance the generation quality, we used perceptual loss that guides the model to focus on the global style of generated images. Experimental results on over 2000 Korean characters demonstrate that our model consistently generates accurate and detailed font images and outperforms benchmark methods, making it a reliable tool for generating authentic Korean fonts across different styles.
Problem

Research questions and friction points this paper is trying to address.

Generating high-fidelity Korean fonts from single reference images
Overcoming instability and mode collapse in traditional AFG methods
Ensuring accurate character generation for unseen Korean characters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based AFG for high-quality Korean fonts
Text encoder processes phonetic representations accurately
Pre-trained style encoder enhances generation quality
🔎 Similar Papers
No similar papers found.
Abdul Sami
Abdul Sami
Soongsil University
Diffusion ModelsImage GenerationComputer VisionDeep Learning
Avinash Kumar
Avinash Kumar
Research Assistant Soongsil University, Seoul, South Korea
Machine LearningDeep LearningComputer Vision GAN's
I
Irfanullah Memon
School of Computer Science and Engineering, Soongsil University, Seoul 06978, Korea
Y
Youngwon Jo
School of Computer Science and Engineering, Soongsil University, Seoul 06978, Korea
M
Muhammad Rizwan
School of Computer Science and Engineering, Soongsil University, Seoul 06978, Korea
J
Jaeyoung Choi
School of Computer Science and Engineering, Soongsil University, Seoul 06978, Korea