🤖 AI Summary
To address two key limitations in handwritten text synthesis—weak modeling of long-range spatial dependencies by conventional convolutional architectures and distortion of fine-grained stylistic details due to neglect of frequency-domain information—this paper proposes a Frequency-Guided Generative Adversarial Network (FG-GAN). Methodologically, we design a phase-aware Wave-MLP generator to effectively capture long-range spatial correlations, and introduce a frequency-domain discriminator alongside a frequency distribution loss to explicitly enforce fidelity of high-frequency stroke details and global structural consistency. Evaluated on low-resource Vietnamese and English handwritten datasets, FG-GAN achieves high-fidelity, style-consistent text image generation from only a single reference sample. Experiments demonstrate that synthetic data significantly improves downstream handwriting recognition performance, validating the effectiveness and generalizability of frequency-domain modeling for data augmentation in handwriting synthesis.
📝 Abstract
Labeled handwriting data is often scarce, limiting the effectiveness of recognition systems that require diverse, style-consistent training samples. Handwriting synthesis offers a promising solution by generating artificial data to augment training. However, current methods face two major limitations. First, most are built on conventional convolutional architectures, which struggle to model long-range dependencies and complex stroke patterns. Second, they largely ignore the crucial role of frequency information, which is essential for capturing fine-grained stylistic and structural details in handwriting. To address these challenges, we propose FW-GAN, a one-shot handwriting synthesis framework that generates realistic, writer-consistent text from a single example. Our generator integrates a phase-aware Wave-MLP to better capture spatial relationships while preserving subtle stylistic cues. We further introduce a frequency-guided discriminator that leverages high-frequency components to enhance the authenticity detection of generated samples. Additionally, we introduce a novel Frequency Distribution Loss that aligns the frequency characteristics of synthetic and real handwriting, thereby enhancing visual fidelity. Experiments on Vietnamese and English handwriting datasets demonstrate that FW-GAN generates high-quality, style-consistent handwriting, making it a valuable tool for augmenting data in low-resource handwriting recognition (HTR) pipelines. Official implementation is available at https://github.com/DAIR-Group/FW-GAN