UnIte: Uncertainty-based Iterative Document Sampling for Domain Adaptation in Information Retrieval

📅 2026-04-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

168K/year
🤖 AI Summary
This work addresses the limitations of existing target-domain document sampling strategies in unsupervised domain adaptation for neural retrieval, which often ignore model uncertainty and consequently yield low-quality and inefficient pseudo-queries. To overcome this, the authors propose UnIte, an uncertainty-aware iterative document sampling method that, for the first time, incorporates both aleatoric and epistemic uncertainty into the domain adaptation process. By filtering out documents with high aleatoric uncertainty and prioritizing those with high epistemic uncertainty, UnIte optimizes pseudo-query generation. Leveraging Bayesian uncertainty estimation within an iterative sampling framework, the method achieves substantial gains in target-domain generalization: on the BEIR benchmark, it improves nDCG@10 by 2.45 and 3.49 for small and large models, respectively, using only an average of 4k sampled documents.
📝 Abstract
Unsupervised domain adaptation generalizes neural retrievers to an unseen domain by generating pseudo queries on target domain documents. The quality and efficiency of this adaptation critically depend on which documents are selected for pseudo query generation. The existing document sampling method focuses on diversity but fails to capture model uncertainty. In contrast, we propose **Un**certainty-based **Ite**rative Document Sampling (UnIte) addressing these limitations by (1) filtering documents with high aleatoric uncertainty and (2) prioritizing those with high epistemic uncertainty, maximizing the learning utility of the current model. We conducted extensive experiments on a large corpus of BEIR with small and large models, showing significant gains of +2.45 and +3.49 nDCG@10 with a smaller training sample size, 4k on average.
Problem

Research questions and friction points this paper is trying to address.

domain adaptation
information retrieval
document sampling
pseudo query generation
model uncertainty
Innovation

Methods, ideas, or system contributions that make the work stand out.

uncertainty-based sampling
domain adaptation
information retrieval
pseudo query generation
neural retrievers
🔎 Similar Papers
No similar papers found.