🤖 AI Summary
Traditional retrieval methods (e.g., BM25, DPR) struggle to jointly optimize topical relevance and temporal alignment in time-sensitive retrieval. To address this, this paper introduces explicit temporal signal modeling into dense passage retrieval for the first time: it jointly encodes query timestamps and document publication dates into the dense representation space of a BERT dual-encoder architecture, and proposes a temporal-aware negative sampling strategy coupled with a contrastive learning objective to enhance temporal semantic discrimination. The core innovations are an end-to-end fusion mechanism for temporal embeddings and a temporally aware training paradigm. Experiments on ArchivalQA and ChroniclingAmericaQA demonstrate significant improvements: Top-1 accuracy increases by 6.63% and 9.56%, respectively, while NDCG@10 improves by 3.79% and 4.68%, substantially outperforming baseline methods.
📝 Abstract
Temporal awareness is crucial in many information retrieval tasks, particularly in scenarios where the relevance of documents depends on their alignment with the query's temporal context. Traditional retrieval methods such as BM25 and Dense Passage Retrieval (DPR) excel at capturing lexical and semantic relevance but fall short in addressing time-sensitive queries. To bridge this gap, we introduce the temporal retrieval model that integrates explicit temporal signals by incorporating query timestamps and document dates into the representation space. Our approach ensures that retrieved passages are not only topically relevant but also temporally aligned with user intent. We evaluate our approach on two large-scale benchmark datasets, ArchivalQA and ChroniclingAmericaQA, achieving substantial performance gains over standard retrieval baselines. In particular, our model improves Top-1 retrieval accuracy by 6.63% and NDCG@10 by 3.79% on ArchivalQA, while yielding a 9.56% boost in Top-1 retrieval accuracy and 4.68% in NDCG@10 on ChroniclingAmericaQA. Additionally, we introduce a time-sensitive negative sampling strategy, which refines the model's ability to distinguish between temporally relevant and irrelevant documents during training. Our findings highlight the importance of explicitly modeling time in retrieval systems and set a new standard for handling temporally grounded queries.