DS@GT at LongEval: Evaluating Temporal Performance in Web Search Systems and Topics with Two-Stage Retrieval

📅 2025-07-11

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

To address performance degradation of information retrieval (IR) models caused by dynamic evolution of web content, this paper introduces a temporal evaluation framework built upon Qwant web snapshots and participates in the LongEval 2025 benchmark. We propose a two-stage sparse retrieval architecture: (1) a topic modeling–driven query expansion stage to mitigate semantic drift, and (2) a temporal-aware document re-ranking stage to enhance long-term stability. Our approach operates entirely on sparse signals—requiring no fine-tuning of large language models—ensuring efficiency and robustness. On the full test set, our method achieves an average NDCG@10 of 0.296; on the May 2023 subset, it reaches 0.395—substantially outperforming baseline methods. These results empirically validate the effectiveness of explicit temporal modeling and staged optimization for long-term IR.

Technology Category

Application Category

📝 Abstract

Information Retrieval (IR) models are often trained on static datasets, making them vulnerable to performance degradation as web content evolves. The DS@GT competition team participated in the Longitudinal Evaluation of Model Performance (LongEval) lab at CLEF 2025, which evaluates IR systems across temporally distributed web snapshots. Our analysis of the Qwant web dataset includes exploratory data analysis with topic modeling over time. The two-phase retrieval system employs sparse keyword searches, utilizing query expansion and document reranking. Our best system achieves an average NDCG@10 of 0.296 across the entire training and test dataset, with an overall best score of 0.395 on 2023-05. The accompanying source code for this paper is at https://github.com/dsgt-arc/longeval-2025

Problem

Research questions and friction points this paper is trying to address.

Evaluating IR systems' temporal performance degradation over evolving web content

Assessing two-stage retrieval methods on temporally distributed web snapshots

Analyzing topic modeling impact on search relevance across time periods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-phase retrieval system with sparse keyword searches

Query expansion and document reranking techniques

Temporal performance evaluation on web snapshots

🔎 Similar Papers

Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time