UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Chinese calligraphy generation methods struggle to simultaneously ensure character-level fidelity and page-level aesthetics (e.g., stroke connectivity and inter-character spacing), often compromising structural correctness for layout coherence. This paper proposes the first column-level calligraphic generation–recognition unified diffusion framework: it enforces character structural integrity via recognition constraints while injecting stylistic priors through generative modeling, enabling concept-level feature sharing. We introduce asymmetric noise scheduling and rasterized bounding-box encoding to explicitly model spatial priors, and adopt hybrid training on synthetic, annotated, and unlabeled data. The method achieves significant improvements in both tasks under few-shot settings, attaining state-of-the-art performance in generation quality, stroke continuity, and layout fidelity—while concurrently boosting recognition accuracy. Furthermore, it generalizes successfully to ancient scripts, including oracle bone inscriptions and Egyptian hieroglyphs.

Technology Category

Application Category

📝 Abstract
Computational replication of Chinese calligraphy remains challenging. Existing methods falter, either creating high-quality isolated characters while ignoring page-level aesthetics like ligatures and spacing, or attempting page synthesis at the expense of calligraphic correctness. We introduce extbf{UniCalli}, a unified diffusion framework for column-level recognition and generation. Training both tasks jointly is deliberate: recognition constrains the generator to preserve character structure, while generation provides style and layout priors. This synergy fosters concept-level abstractions that improve both tasks, especially in limited-data regimes. We curated a dataset of over 8,000 digitized pieces, with ~4,000 densely annotated. UniCalli employs asymmetric noising and a rasterized box map for spatial priors, trained on a mix of synthetic, labeled, and unlabeled data. The model achieves state-of-the-art generative quality with superior ligature continuity and layout fidelity, alongside stronger recognition. The framework successfully extends to other ancient scripts, including Oracle bone inscriptions and Egyptian hieroglyphs. Code and data can be viewed in href{https://github.com/EnVision-Research/UniCalli}{this URL}.
Problem

Research questions and friction points this paper is trying to address.

Generates column-level Chinese calligraphy with structural accuracy
Recognizes calligraphy characters while preserving stylistic layout
Unifies generation and recognition to overcome limited-data challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified diffusion framework for column-level calligraphy
Joint training enhances character structure and style
Asymmetric noising with rasterized box map for spatial priors