🤖 AI Summary
This paper addresses the fundamental limitation of large language models (LLMs) in temporal reasoning for time-sensitive question answering (TSQA), where existing benchmarks conflate performance with memorized knowledge or web-retrievable facts. To isolate pure temporal reasoning, we introduce UnSeenTimeQA—the first data-contamination-free TSQA benchmark. It constructs synthetic temporal event scenarios grounded in novel, non-real-world facts and poses three categories of time-sensitive questions, deliberately eliminating reliance on pretraining knowledge or external retrieval. Our evaluation paradigm enforces zero-web-access and anchors facts in synthetic temporality, while its structured framework supports long-range dependencies and parallel timelines. Experiments across five state-of-the-art LLMs reveal a significant accuracy drop on UnSeenTimeQA compared to real-fact TSQA benchmarks—demonstrating, for the first time, that current LLMs fundamentally lack intrinsic temporal cognition, not merely factual recall.
📝 Abstract
This paper introduces UnSeenTimeQA, a novel data contamination-free time-sensitive question-answering (TSQA) benchmark. It differs from existing TSQA benchmarks by avoiding web-searchable queries grounded in the real-world. We present a series of time-sensitive event scenarios based on synthetically generated facts. It requires large language models (LLMs) to engage in genuine temporal reasoning without depending on the factual knowledge acquired during the pre-training phase. We designed three types of time-sensitive questions to test LLMs' temporal reasoning abilities over sequential and parallel event occurrences. Our evaluation of five LLMs on synthetic fact-based TSQA reveals mixed results: while they perform well on simpler subsets, their overall performance remains inferior as compared to real-world fact-based TSQA. Error analysis of LLM-generated reasoning chains indicates that LLMs face difficulties in reasoning over long-range event dependencies and parallel event timelines that unfold concurrently.