🤖 AI Summary
Existing Chinese calligraphy datasets are scarce and predominantly annotated only at the character level, lacking cultural metadata—such as stylistic category, dynasty, and calligrapher—which hinders both accurate recognition and historical analysis. Method: We introduce MCCD, the first fine-grained, triple-attribute (style/dynasty/calligrapher) annotated dataset of individual Chinese calligraphic characters, comprising 329,715 images across 7,765 classes. We propose a novel annotation paradigm integrating expert verification with historical textual scholarship, and establish comprehensive single-task and multi-task deep learning benchmarks. Contribution/Results: Experiments reveal that stroke complexity and attribute coupling significantly degrade recognition performance; we report baseline results across multiple challenging subsets. MCCD bridges the gap from character-level recognition to cultural-attribute understanding, providing a foundational resource for calligraphic AI and diachronic studies of Chinese character evolution.
📝 Abstract
Research on the attribute information of calligraphy, such as styles, dynasties, and calligraphers, holds significant cultural and historical value. However, the styles of Chinese calligraphy characters have evolved dramatically through different dynasties and the unique touches of calligraphers, making it highly challenging to accurately recognize these different characters and their attributes. Furthermore, existing calligraphic datasets are extremely scarce, and most provide only character-level annotations without additional attribute information. This limitation has significantly hindered the in-depth study of Chinese calligraphy. To fill this gap, we present a novel Multi-Attribute Chinese Calligraphy Character Dataset (MCCD). The dataset encompasses 7,765 categories with a total of 329,715 isolated image samples of Chinese calligraphy characters, and three additional subsets were extracted based on the attribute labeling of the three types of script styles (10 types), dynasties (15 periods) and calligraphers (142 individuals). The rich multi-attribute annotations render MCCD well-suited diverse research tasks, including calligraphic character recognition, writer identification, and evolutionary studies of Chinese characters. We establish benchmark performance through single-task and multi-task recognition experiments across MCCD and all of its subsets. The experimental results demonstrate that the complexity of the stroke structure of the calligraphic characters, and the interplay between their different attributes, leading to a substantial increase in the difficulty of accurate recognition. MCCD not only fills a void in the availability of detailed calligraphy datasets but also provides valuable resources for advancing research in Chinese calligraphy and fostering advancements in multiple fields. The dataset is available at https://github.com/SCUT-DLVCLab/MCCD.