Overview of the TREC 2022 deep learning track

📅 2025-07-10

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

To address the insufficient test-set coverage in the TREC 2022 Deep Learning Track paragraph retrieval task, this work constructs a more comprehensive, high-quality, paragraph-level manually annotated test collection, while also supporting document ranking. Methodologically, we conduct systematic experiments leveraging the extended MS MARCO paragraph and document corpora, integrating large-scale pretrained language models, dense retrieval, and traditional sparse retrieval techniques—with particular emphasis on improving annotation quality to enhance dataset discriminability and reusability. Our key contributions are: (1) releasing a high signal-to-noise-ratio paragraph retrieval benchmark; (2) demonstrating, for the first time, that non-dense retrieval methods surpass single-stage dense retrieval at top-tier performance levels; and (3) empirically confirming that multi-stage hybrid retrieval strategies significantly outperform purely dense approaches—providing novel evidence to guide the development of deep reranking models.

Technology Category

Application Category

📝 Abstract

This is the fourth year of the TREC Deep Learning track. As in previous years, we leverage the MS MARCO datasets that made hundreds of thousands of human annotated training labels available for both passage and document ranking tasks. In addition, this year we also leverage both the refreshed passage and document collections that were released last year leading to a nearly $16$ times increase in the size of the passage collection and nearly four times increase in the document collection size. Unlike previous years, in 2022 we mainly focused on constructing a more complete test collection for the passage retrieval task, which has been the primary focus of the track. The document ranking task was kept as a secondary task, where document-level labels were inferred from the passage-level labels. Our analysis shows that similar to previous years, deep neural ranking models that employ large scale pretraining continued to outperform traditional retrieval methods. Due to the focusing our judging resources on passage judging, we are more confident in the quality of this year's queries and judgments, with respect to our ability to distinguish between runs and reuse the dataset in future. We also see some surprises in overall outcomes. Some top-performing runs did not do dense retrieval. Runs that did single-stage dense retrieval were not as competitive this year as they were last year.

Problem

Research questions and friction points this paper is trying to address.

Constructing a complete test collection for passage retrieval

Comparing deep neural ranking models with traditional methods

Analyzing performance of dense retrieval approaches

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverage MS MARCO datasets for training

Use refreshed passage and document collections

Focus on passage retrieval test collection

🔎 Similar Papers

No similar papers found.