Decoupling Layout from Glyph in Online Chinese Handwriting Generation

📅 2024-10-03
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the poor controllability and weak coherence in online Chinese handwritten text-line generation, caused by the intrinsic coupling between layout and glyph modeling. We propose, for the first time, a layout–glyph disentangled generation paradigm. Methodologically: (1) we design a context-aware autoregressive layout generator to model character positions and sequential structural dependencies; (2) we develop a diffusion-based 1D U-Net font synthesizer conditioned on a multi-scale calligraphic style encoder, enabling fine-grained style control and high-fidelity glyph synthesis. Evaluated on CASIA-OLHWDB, our end-to-end framework generates complete text lines with accurate spatial structure, consistent stylistic attributes, and photorealistic quality indistinguishable from genuine handwriting. Both qualitative assessment and quantitative metrics—e.g., layout accuracy, style consistency, and perceptual fidelity—significantly surpass state-of-the-art baselines. This work establishes a novel, controllable paradigm for online handwritten text-line generation.

Technology Category

Application Category

📝 Abstract
Text plays a crucial role in the transmission of human civilization, and teaching machines to generate online handwritten text in various styles presents an interesting and significant challenge. However, most prior work has concentrated on generating individual Chinese fonts, leaving {complete text line generation largely unexplored}. In this paper, we identify that text lines can naturally be divided into two components: layout and glyphs. Based on this division, we designed a text line layout generator coupled with a diffusion-based stylized font synthesizer to address this challenge hierarchically. More concretely, the layout generator performs in-context-like learning based on the text content and the provided style references to generate positions for each glyph autoregressively. Meanwhile, the font synthesizer which consists of a character embedding dictionary, a multi-scale calligraphy style encoder, and a 1D U-Net based diffusion denoiser will generate each font on its position while imitating the calligraphy style extracted from the given style references. Qualitative and quantitative experiments on the CASIA-OLHWDB demonstrate that our method is capable of generating structurally correct and indistinguishable imitation samples.
Problem

Research questions and friction points this paper is trying to address.

Decouples layout and glyph generation
Generates complete Chinese text lines
Imitates calligraphy styles accurately
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples layout from glyph generation
Uses diffusion-based stylized font synthesizer
Implements autoregressive text line layout generator
🔎 Similar Papers
No similar papers found.
M
Min-Si Ren
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China; State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institution of Automation Chinese Academy of Sciences, Beijing 100190, China
Yan-Ming Zhang
Yan-Ming Zhang
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China; State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institution of Automation Chinese Academy of Sciences, Beijing 100190, China
Y
Yi Chen
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China; State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institution of Automation Chinese Academy of Sciences, Beijing 100190, China