CalliMaster: Mastering Page-level Chinese Calligraphy via Layout-guided Spatial Planning

๐Ÿ“… 2026-03-12
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of page-level Chinese calligraphy generation, which requires simultaneous fidelity to fine glyph details and coherent global layoutโ€”a balance existing methods struggle to achieve. To this end, we propose CalliMaster, a novel framework that decouples spatial planning from content synthesis through a coarse-to-fine โ€œtext โ†’ layout โ†’ imageโ€ pipeline within a unified multimodal diffusion Transformer. The model first predicts character bounding boxes and then leverages this geometric layout as a conditioning prompt to synthesize high-fidelity calligraphic images. Crucially, the layout is treated as an editable constraint, enabling semantic reordering, scaling, and positional adjustments while automatically harmonizing negative space and brushstroke dynamics. Our approach achieves state-of-the-art generation quality and supports practical applications such as controllable editing, digital artifact restoration, and handwriting authentication.

Technology Category

Application Category

๐Ÿ“ Abstract
Page-level calligraphy synthesis requires balancing glyph precision with layout composition. Existing character models lack spatial context, while page-level methods often compromise brushwork detail. In this paper, we present \textbf{CalliMaster}, a unified framework for controllable generation and editing that resolves this conflict by decoupling spatial planning from content synthesis. Inspired by the human cognitive process of ``planning before writing'', we introduce a coarse-to-fine pipeline \textbf{(Text $\rightarrow$ Layout $\rightarrow$ Image)} to tackle the combinatorial complexity of page-scale synthesis. Operating within a single Multimodal Diffusion Transformer, a spatial planning stage first predicts character bounding boxes to establish the global spatial arrangement. This intermediate layout then serves as a geometric prompt for the content synthesis stage, where the same network utilizes flow-matching to render high-fidelity brushwork. Beyond achieving state-of-the-art generation quality, this disentanglement supports versatile downstream capabilities. By treating the layout as a modifiable constraint, CalliMaster enables controllable semantic re-planning: users can resize or reposition characters while the model automatically harmonizes the surrounding void space and brush momentum. Furthermore, we demonstrate the framework's extensibility to artifact restoration and forensic analysis, providing a comprehensive tool for digital cultural heritage.
Problem

Research questions and friction points this paper is trying to address.

page-level calligraphy synthesis
spatial planning
layout composition
brushwork detail
character layout
Innovation

Methods, ideas, or system contributions that make the work stand out.

layout-guided synthesis
spatial planning
Chinese calligraphy generation
multimodal diffusion transformer
controllable editing
๐Ÿ”Ž Similar Papers
No similar papers found.
Tianshuo Xu
Tianshuo Xu
The Hong Kong University of Science and Technology (Guang Zhou)
Diffusion ModelsAutonomous DrivingLow-Level Computer Vision
T
Tiantian Hong
Faculty of Engineering and IT, University of Technology Sydney
Z
Zhifei Chen
The Hong Kong University of Science and Technology (Guangzhou)
F
Fei Chao
Xiamen University
Y
Ying-cong Chen
The Hong Kong University of Science and Technology