Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties

📅 2025-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing temporal reasoning benchmarks rely heavily on rule-based generation, lacking historical depth, cultural context, and diversity of temporal entities—thus inadequately evaluating large language models’ (LLMs’) temporal cognition. Method: We introduce CTM (Chinese Dynasty Temporal Modeling), the first benchmark dedicated to Chinese dynastic chronology. CTM innovatively integrates historical semantic modeling, multi-granularity temporal alignment, and dynasty-specific cultural constraints into a dynamic evaluation framework covering cross-entity temporal inference, pairwise temporal alignment, and culturally contextualized reasoning. It leverages authoritative chronologies and historical texts to construct a high-quality annotated dataset, incorporating adversarial question design, temporal-logic validation, and expert-in-the-loop evaluation. Contribution/Results: Experiments reveal critical weaknesses in mainstream LLMs—including poor long-span dynastic reasoning, inaccurate parsing of ambiguous calendrical systems, and failure in event-relative positioning. CTM establishes a reproducible, highly diagnostic, history-domain benchmark for temporal cognitive modeling.

Technology Category

Application Category

📝 Abstract
Temporal reasoning is fundamental to human cognition and is crucial for various real-world applications. While recent advances in Large Language Models have demonstrated promising capabilities in temporal reasoning, existing benchmarks primarily rely on rule-based construction, lack contextual depth, and involve a limited range of temporal entities. To address these limitations, we introduce Chinese Time Reasoning (CTM), a benchmark designed to evaluate LLMs on temporal reasoning within the extensive scope of Chinese dynastic chronology. CTM emphasizes cross-entity relationships, pairwise temporal alignment, and contextualized and culturally-grounded reasoning, providing a comprehensive evaluation. Extensive experimental results reveal the challenges posed by CTM and highlight potential avenues for improvement.
Problem

Research questions and friction points this paper is trying to address.

Temporal reasoning in Chinese dynasties
Benchmark for Large Language Models
Cross-entity and contextual temporal alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chinese Time Reasoning benchmark
cross-entity relationships focus
culturally-grounded temporal reasoning
🔎 Similar Papers
No similar papers found.
Zhenglin Wang
Zhenglin Wang
Southeast University
Natural Language ProcessingEfficient NLP
J
Jialong Wu
School of Computer Science and Engineering, Key Laboratory of Computer Network and Information Integration, Ministry of Education, Southeast University, China; Tongyi Lab, Alibaba Group
P
Pengfei Li
School of Computer Science and Engineering, Key Laboratory of Computer Network and Information Integration, Ministry of Education, Southeast University, China
Y
Yong Jiang
Tongyi Lab, Alibaba Group
Deyu Zhou
Deyu Zhou
Professor, School of computer science and engineering, SEU
natural language processing