🤖 AI Summary
This paper addresses performance degradation in information retrieval (IR) models caused by temporal drift in queries and relevance. We propose the first temporal IR evaluation framework designed for long-term deployment. Methodologically, we establish a longitudinal evaluation paradigm: (1) constructing a long-horizon, open-source benchmark dataset—temporally segmented and covering both web search and scientific retrieval; (2) designing a dynamic relevance annotation protocol and cross-period performance attribution analysis; and (3) modeling model timeliness decay. Key contributions include: (1) releasing the first open-source, long-horizon IR evaluation dataset, which attracted 27 international teams; (2) the first systematic quantification of timeliness decay across mainstream models—revealing an average 32% drop in nDCG@10 under a six-month temporal shift; and (3) catalyzing a paradigm shift toward adaptive, temporally robust IR models.
📝 Abstract
This paper presents the third edition of the LongEval Lab, part of the CLEF 2025 conference, which continues to explore the challenges of temporal persistence in Information Retrieval (IR). The lab features two tasks designed to provide researchers with test data that reflect the evolving nature of user queries and document relevance over time. By evaluating how model performance degrades as test data diverge temporally from training data, LongEval seeks to advance the understanding of temporal dynamics in IR systems. The 2025 edition aims to engage the IR and NLP communities in addressing the development of adaptive models that can maintain retrieval quality over time in the domains of web search and scientific retrieval.