Chronicles-OCR: A Cross-Temporal Perception Benchmark for the Evolutionary Trajectory of Chinese Characters

📅 2026-05-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

237K/year
🤖 AI Summary
This work addresses the lack of systematic evaluation of visual large language models (VLMs) on the cross-era morphological evolution of Chinese characters, as existing datasets are typically confined to a single historical period. To bridge this gap, we introduce Chronicles-OCR, the first cross-temporal visual perception benchmark encompassing the complete evolutionary trajectory of the “Seven Scripts” of Chinese writing. In collaboration with domain experts, we curated a balanced set of 2,800 images spanning diverse physical media—from oracle bones to paper-based calligraphy. Employing a phase-adaptive annotation paradigm and four quantitative tasks—character localization, fine-grained recognition, text parsing, and script classification—we effectively disentangle visual perception from semantic reasoning. Chronicles-OCR provides the first systematic modeling of diachronic character morphology and topological shifts, revealing critical limitations of current VLMs in historical script perception and establishing a rigorous foundation for evaluating their temporal-aware visual understanding capabilities.
📝 Abstract
Vision Large Language Models (VLLMs) have achieved remarkable success in modern text-rich visual understanding. However, their perceptual robustness in the face of the continuous morphological evolution of historical writing systems remains largely unexplored. Existing ancient text datasets typically focus on isolated historical periods, failing to capture the systematic visual distribution shifts spanning thousands of years. To bridge this gap and empower Digital Humanities, we introduce Chronicles-OCR, the first comprehensive benchmark specifically designed to evaluate the cross-temporal visual perception capabilities of VLLMs across the complete evolutionary trajectory of Chinese characters, known as the Seven Chinese Scripts. Curated in collaboration with top-tier institutional domain experts, the dataset comprises 2,800 strictly balanced images encompassing highly diverse physical media, ranging from tortoise shells to paper-based calligraphy. To accommodate the drastic morphological and topological variations across different historical stages, we propose a novel Stage-Adaptive Annotation Paradigm. Based on this, Chronicles-OCR formulates four rigorous quantitative tasks: cross-period character spotting, fine-grained archaic character recognition via visual referring, ancient text parsing, and script classification. By isolating visual perception from semantic reasoning, Chronicles-OCR provides an authoritative platform to expose the limitations of current VLLMs, paving the way for robust, evolution-aware historical text perception. Chronicles-OCR is publicly available at https://github.com/VirtualLUOUCAS/Chronicles-OCR.
Problem

Research questions and friction points this paper is trying to address.

cross-temporal perception
Chinese character evolution
historical text understanding
visual distribution shift
Vision Large Language Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-temporal perception
Stage-Adaptive Annotation
Chinese character evolution
Vision Large Language Models
Historical OCR
G
Gengluo Li
Institute of Information Engineering, Chinese Academy of Sciences
Shangpin Peng
Shangpin Peng
Harbin Institute of Technology, Shenzhen
Artificial IntelligenceLLMPreference Optimization
X
Xingyu Wan
Tencent
Chengquan Zhang
Chengquan Zhang
Unknown affiliation
computer visionapplication of deep learning
H
Hao Feng
Tencent
X
Xin Xu
Tencent
P
Pian Wu
Tencent
B
Bang Li
Anyang Normal University
Z
Zengmao Ding
Anyang Normal University
Y
Yongge Liu
Anyang Normal University
Y
Yipei Ye
The Palace Museum
Y
Yang Yang
The Palace Museum
Zhan Shu
Zhan Shu
Professor, University of Alberta
controlcontrol engineeringcontrol theory
G
Guojun Yan
Tencent
Zhe Li
Zhe Li
Huawei
Computer Vision3D Vision
Can Ma
Can Ma
Unknown affiliation
Weiping Wang
Weiping Wang
School of Information Science and Engineering, Central South University
Computer NetworkNetwork Security
Y
Yu Zhou
Nankai University
Han Hu
Han Hu
Distinguished Scientist, Tencent Hunyuan
Computer VisionDeep LearningMachine Learning