Can Language Models Handle a Non-Gregorian Calendar?

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

142K/year

🤖 AI Summary

This work presents the first systematic evaluation of large language models’ (LLMs) temporal understanding and reasoning over non-Gregorian calendars, with a focus on the Japanese era-based system. To address the lack of cultural-specific temporal benchmarks, we construct a four-task dataset covering calendar conversion, temporal arithmetic, cross-calendar consistency (e.g., bidirectional Gregorian–Japanese era mapping), and temporal knowledge retrieval, and evaluate multilingual (English/Japanese) state-of-the-art LLMs. Results show that while models perform reasonably well on basic calendar conversion, their accuracy drops significantly on era-based arithmetic (e.g., “Heisei 10 + 5 years”) and cross-calendar consistency, revealing deficits in culturally embedded temporal knowledge and symbolic temporal reasoning. The study uncovers systemic limitations of LLMs in non-Western temporal frameworks, establishing a new benchmark and empirical foundation for culturally adaptive temporal evaluation and time-aware model development.

Technology Category

Application Category

📝 Abstract

Temporal reasoning and knowledge are essential capabilities for language models (LMs). While much prior work has analyzed and improved temporal reasoning in LMs, most studies have focused solely on the Gregorian calendar. However, many non-Gregorian systems, such as the Japanese, Hijri, and Hebrew calendars, are in active use and reflect culturally grounded conceptions of time. If and how well current LMs can accurately handle such non-Gregorian calendars has not been evaluated so far. Here, we present a systematic evaluation of how well open-source LMs handle one such non-Gregorian system: the Japanese calendar. For our evaluation, we create datasets for four tasks that require both temporal knowledge and temporal reasoning. Evaluating a range of English-centric and Japanese-centric LMs, we find that some models can perform calendar conversions, but even Japanese-centric models struggle with Japanese-calendar arithmetic and with maintaining consistency across calendars. Our results highlight the importance of developing LMs that are better equipped for culture-specific calendar understanding.

Problem

Research questions and friction points this paper is trying to address.

Evaluating language models' ability to handle non-Gregorian calendars

Assessing Japanese calendar conversion and arithmetic capabilities

Testing temporal reasoning consistency across different calendar systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluating language models on Japanese calendar tasks

Creating datasets for temporal knowledge and reasoning

Assessing calendar conversion and arithmetic capabilities

🔎 Similar Papers

Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time