🤖 AI Summary
To address performance degradation of information retrieval (IR) models caused by dynamic evolution of web content, this paper introduces a temporal evaluation framework built upon Qwant web snapshots and participates in the LongEval 2025 benchmark. We propose a two-stage sparse retrieval architecture: (1) a topic modeling–driven query expansion stage to mitigate semantic drift, and (2) a temporal-aware document re-ranking stage to enhance long-term stability. Our approach operates entirely on sparse signals—requiring no fine-tuning of large language models—ensuring efficiency and robustness. On the full test set, our method achieves an average NDCG@10 of 0.296; on the May 2023 subset, it reaches 0.395—substantially outperforming baseline methods. These results empirically validate the effectiveness of explicit temporal modeling and staged optimization for long-term IR.
📝 Abstract
Information Retrieval (IR) models are often trained on static datasets, making them vulnerable to performance degradation as web content evolves. The DS@GT competition team participated in the Longitudinal Evaluation of Model Performance (LongEval) lab at CLEF 2025, which evaluates IR systems across temporally distributed web snapshots. Our analysis of the Qwant web dataset includes exploratory data analysis with topic modeling over time. The two-phase retrieval system employs sparse keyword searches, utilizing query expansion and document reranking. Our best system achieves an average NDCG@10 of 0.296 across the entire training and test dataset, with an overall best score of 0.395 on 2023-05. The accompanying source code for this paper is at https://github.com/dsgt-arc/longeval-2025