LongEval at CLEF 2025: Longitudinal Evaluation of IR Systems on Web and Scientific Data

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing information retrieval (IR) evaluation frameworks lack systematic methodologies for assessing long-term system performance under dynamic data evolution. Method: This paper proposes the first temporal-aware evaluation framework, specifically designed for evolving collections. It constructs dynamic test collections for web and scientific literature retrieval, incorporating multi-period relevance annotations, document stream updates, and query drift to enable longitudinal analysis of system stability and adaptability. The framework introduces temporal extensions of standard metrics—such as time-aware nDCG—to quantitatively track performance fluctuations over time. Contribution/Results: Upon public release, the framework attracted participation from 19 international teams. Their analyses revealed, for the first time, systematic temporal performance decay patterns across mainstream IR models and exposed significant inter-model differences in temporal adaptability. These findings advance IR evaluation paradigms toward realistic, time-evolving environments and establish foundational infrastructure for longitudinal IR research.

Technology Category

Application Category

📝 Abstract
The LongEval lab focuses on the evaluation of information retrieval systems over time. Two datasets are provided that capture evolving search scenarios with changing documents, queries, and relevance assessments. Systems are assessed from a temporal perspective-that is, evaluating retrieval effectiveness as the data they operate on changes. In its third edition, LongEval featured two retrieval tasks: one in the area of ad-hoc web retrieval, and another focusing on scientific article retrieval. We present an overview of this year's tasks and datasets, as well as the participating systems. A total of 19 teams submitted their approaches, which we evaluated using nDCG and a variety of measures that quantify changes in retrieval effectiveness over time.
Problem

Research questions and friction points this paper is trying to address.

Evaluating information retrieval systems over time with evolving data
Assessing retrieval effectiveness as documents and queries change
Measuring temporal performance changes in web and scientific retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

Longitudinal evaluation of IR systems
Two datasets capture evolving search scenarios
Assessed retrieval effectiveness over time
🔎 Similar Papers
No similar papers found.
M
Matteo Cancellieri
The Open University, Milton Keynes, UK
A
Alaa El-Ebshihy
Research Studios Austria, Data Science Studio, Vienna, Austria; TU Wien, Austria
T
Tobias Fink
Research Studios Austria, Data Science Studio, Vienna, Austria; TU Wien, Austria
Maik Fröbe
Maik Fröbe
Friedrich-Schiller-Universität Jena
Information RetrievalNatural Language Processing
P
Petra Galuščáková
University of Stavanger, Stavanger, Norway
G
Gabriela Gonzalez-Saez
Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France
Lorraine Goeuriot
Lorraine Goeuriot
Université Grenoble Alpes
D
David Iommi
Research Studios Austria, Data Science Studio, Vienna, Austria
Jüri Keller
Jüri Keller
TH Köln - University of Applied Sciences
Information Retrieval
Petr Knoth
Petr Knoth
Professor of Data Science, Knowledge Media institute, The Open University
Data ScienceNLPInformation RetrievalScholarly communicationOpen Science
Philippe Mulhem
Philippe Mulhem
LIG-CNRS
Computer ScienceInformation Retrieval
F
Florina Piroi
Research Studios Austria, Data Science Studio, Vienna, Austria; TU Wien, Austria
David Pride
David Pride
The Knowledge Media Institute, The Open University
BibliometricsScientometricsSemantometricsNatural Language ProcessingCitation Analysis
Philipp Schaer
Philipp Schaer
TH Köln - University of Applied Sciences
Information RetrievalInformation ScienceDigital Libraries