🤖 AI Summary
Existing information retrieval (IR) evaluation frameworks lack systematic methodologies for assessing long-term system performance under dynamic data evolution. Method: This paper proposes the first temporal-aware evaluation framework, specifically designed for evolving collections. It constructs dynamic test collections for web and scientific literature retrieval, incorporating multi-period relevance annotations, document stream updates, and query drift to enable longitudinal analysis of system stability and adaptability. The framework introduces temporal extensions of standard metrics—such as time-aware nDCG—to quantitatively track performance fluctuations over time. Contribution/Results: Upon public release, the framework attracted participation from 19 international teams. Their analyses revealed, for the first time, systematic temporal performance decay patterns across mainstream IR models and exposed significant inter-model differences in temporal adaptability. These findings advance IR evaluation paradigms toward realistic, time-evolving environments and establish foundational infrastructure for longitudinal IR research.
📝 Abstract
The LongEval lab focuses on the evaluation of information retrieval systems over time. Two datasets are provided that capture evolving search scenarios with changing documents, queries, and relevance assessments. Systems are assessed from a temporal perspective-that is, evaluating retrieval effectiveness as the data they operate on changes. In its third edition, LongEval featured two retrieval tasks: one in the area of ad-hoc web retrieval, and another focusing on scientific article retrieval. We present an overview of this year's tasks and datasets, as well as the participating systems. A total of 19 teams submitted their approaches, which we evaluated using nDCG and a variety of measures that quantify changes in retrieval effectiveness over time.