LongEval at CLEF 2025: Longitudinal Evaluation of IR Systems on Web and Scientific Data

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Existing information retrieval (IR) evaluation frameworks lack systematic methodologies for assessing long-term system performance under dynamic data evolution. Method: This paper proposes the first temporal-aware evaluation framework, specifically designed for evolving collections. It constructs dynamic test collections for web and scientific literature retrieval, incorporating multi-period relevance annotations, document stream updates, and query drift to enable longitudinal analysis of system stability and adaptability. The framework introduces temporal extensions of standard metrics—such as time-aware nDCG—to quantitatively track performance fluctuations over time. Contribution/Results: Upon public release, the framework attracted participation from 19 international teams. Their analyses revealed, for the first time, systematic temporal performance decay patterns across mainstream IR models and exposed significant inter-model differences in temporal adaptability. These findings advance IR evaluation paradigms toward realistic, time-evolving environments and establish foundational infrastructure for longitudinal IR research.

Technology Category

Application Category

📝 Abstract

The LongEval lab focuses on the evaluation of information retrieval systems over time. Two datasets are provided that capture evolving search scenarios with changing documents, queries, and relevance assessments. Systems are assessed from a temporal perspective-that is, evaluating retrieval effectiveness as the data they operate on changes. In its third edition, LongEval featured two retrieval tasks: one in the area of ad-hoc web retrieval, and another focusing on scientific article retrieval. We present an overview of this year's tasks and datasets, as well as the participating systems. A total of 19 teams submitted their approaches, which we evaluated using nDCG and a variety of measures that quantify changes in retrieval effectiveness over time.

Problem

Research questions and friction points this paper is trying to address.

Evaluating information retrieval systems over time with evolving data

Assessing retrieval effectiveness as documents and queries change

Measuring temporal performance changes in web and scientific retrieval

Innovation

Methods, ideas, or system contributions that make the work stand out.

Longitudinal evaluation of IR systems

Two datasets capture evolving search scenarios

Assessed retrieval effectiveness over time

🔎 Similar Papers

No similar papers found.