LongEval at CLEF 2025: Longitudinal Evaluation of IR Model Performance

📅 2025-03-11

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This paper addresses performance degradation in information retrieval (IR) models caused by temporal drift in queries and relevance. We propose the first temporal IR evaluation framework designed for long-term deployment. Methodologically, we establish a longitudinal evaluation paradigm: (1) constructing a long-horizon, open-source benchmark dataset—temporally segmented and covering both web search and scientific retrieval; (2) designing a dynamic relevance annotation protocol and cross-period performance attribution analysis; and (3) modeling model timeliness decay. Key contributions include: (1) releasing the first open-source, long-horizon IR evaluation dataset, which attracted 27 international teams; (2) the first systematic quantification of timeliness decay across mainstream models—revealing an average 32% drop in nDCG@10 under a six-month temporal shift; and (3) catalyzing a paradigm shift toward adaptive, temporally robust IR models.

Technology Category

Application Category

📝 Abstract

This paper presents the third edition of the LongEval Lab, part of the CLEF 2025 conference, which continues to explore the challenges of temporal persistence in Information Retrieval (IR). The lab features two tasks designed to provide researchers with test data that reflect the evolving nature of user queries and document relevance over time. By evaluating how model performance degrades as test data diverge temporally from training data, LongEval seeks to advance the understanding of temporal dynamics in IR systems. The 2025 edition aims to engage the IR and NLP communities in addressing the development of adaptive models that can maintain retrieval quality over time in the domains of web search and scientific retrieval.

Problem

Research questions and friction points this paper is trying to address.

Explores temporal persistence challenges in Information Retrieval.

Evaluates IR model performance degradation over time.

Develops adaptive models for web and scientific retrieval.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates IR model temporal performance degradation

Uses evolving user queries and document relevance

Develops adaptive models for web and scientific retrieval

🔎 Similar Papers

HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly