How often do Answers Change? Estimating Recency Requirements in Question Answering

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

This work addresses the challenge that large language models often rely on outdated knowledge when handling time-sensitive queries and struggle to determine whether up-to-date information is required. To overcome the limitations of traditional binary distinctions between “old” and “new” knowledge, the authors propose a novel recency–stationarity classification framework that jointly models how answers evolve over time and vary with context. They introduce RecencyQA, a new open-domain benchmark comprising 4,031 questions, annotated manually and analyzed empirically to enable fine-grained evaluation. Experimental results demonstrate that non-stationary questions—those whose answers change over time—are significantly more challenging for current models, with difficulty increasing as update frequency rises, thereby establishing a foundational benchmark for developing time-aware question-answering systems.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) often rely on outdated knowledge when answering time-sensitive questions, leading to confident yet incorrect responses. Without explicit signals indicating whether up-to-date information is required, models struggle to decide when to retrieve external evidence, how to reason about stale facts, and how to rank answers by their validity. Existing benchmarks either periodically refresh answers or rely on fixed templates, but they do not reflect on how frequently answers change or whether a question inherently requires up-to-date information. To address this gap, we introduce a recency-stationarity taxonomy that categorizes questions by how often their answers change and whether this change frequency is time-invariant or context-dependent. Building on this taxonomy, we present RecencyQA, a dataset of 4,031 open-domain questions annotated with recency and stationarity labels. Through human evaluation and empirical analysis, we show that non-stationary questions, i.e., those where context changes the recency requirement, are significantly more challenging for LLMs, with difficulty increasing as update frequency rises. By explicitly modeling recency and context dependence, RecencyQA enables fine-grained benchmarking and analysis of temporal reasoning beyond binary notions of freshness, and provides a foundation for developing recency-aware and context-sensitive question answering systems.

Problem

Research questions and friction points this paper is trying to address.

recency

question answering

temporal reasoning

knowledge freshness

answer validity

Innovation

Methods, ideas, or system contributions that make the work stand out.

recency-stationarity taxonomy

temporal reasoning

context-dependent recency