LongEval at CLEF 2025: Longitudinal Evaluation of IR Model Performance

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses performance degradation in information retrieval (IR) models caused by temporal drift in queries and relevance. We propose the first temporal IR evaluation framework designed for long-term deployment. Methodologically, we establish a longitudinal evaluation paradigm: (1) constructing a long-horizon, open-source benchmark dataset—temporally segmented and covering both web search and scientific retrieval; (2) designing a dynamic relevance annotation protocol and cross-period performance attribution analysis; and (3) modeling model timeliness decay. Key contributions include: (1) releasing the first open-source, long-horizon IR evaluation dataset, which attracted 27 international teams; (2) the first systematic quantification of timeliness decay across mainstream models—revealing an average 32% drop in nDCG@10 under a six-month temporal shift; and (3) catalyzing a paradigm shift toward adaptive, temporally robust IR models.

Technology Category

Application Category

📝 Abstract
This paper presents the third edition of the LongEval Lab, part of the CLEF 2025 conference, which continues to explore the challenges of temporal persistence in Information Retrieval (IR). The lab features two tasks designed to provide researchers with test data that reflect the evolving nature of user queries and document relevance over time. By evaluating how model performance degrades as test data diverge temporally from training data, LongEval seeks to advance the understanding of temporal dynamics in IR systems. The 2025 edition aims to engage the IR and NLP communities in addressing the development of adaptive models that can maintain retrieval quality over time in the domains of web search and scientific retrieval.
Problem

Research questions and friction points this paper is trying to address.

Explores temporal persistence challenges in Information Retrieval.
Evaluates IR model performance degradation over time.
Develops adaptive models for web and scientific retrieval.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates IR model temporal performance degradation
Uses evolving user queries and document relevance
Develops adaptive models for web and scientific retrieval
M
Matteo Cancellieri
The Open University, Milton Keynes, UK
A
Alaa El-Ebshihy
Research Studios Austria, Data Science Studio, Vienna, Austria; TU Wien, Austria
T
Tobias Fink
Research Studios Austria, Data Science Studio, Vienna, Austria
P
Petra Galuvsvc'akov'a
University of Stavanger, Stavanger, Norway
G
Gabriela González-Sáez
Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France
Lorraine Goeuriot
Lorraine Goeuriot
Université Grenoble Alpes
D
David Iommi
Research Studios Austria, Data Science Studio, Vienna, Austria
Jüri Keller
Jüri Keller
TH Köln - University of Applied Sciences
Information Retrieval
Petr Knoth
Petr Knoth
Professor of Data Science, Knowledge Media institute, The Open University
Data ScienceNLPInformation RetrievalScholarly communicationOpen Science
P
P. Mulhem
Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France
F
Florina Piroi
Research Studios Austria, Data Science Studio, Vienna, Austria; TU Wien, Austria
David Pride
David Pride
The Knowledge Media Institute, The Open University
BibliometricsScientometricsSemantometricsNatural Language ProcessingCitation Analysis
Philipp Schaer
Philipp Schaer
TH Köln - University of Applied Sciences
Information RetrievalInformation ScienceDigital Libraries