π€ AI Summary
Dense retrieval (DR) models often yield inconsistent rankings for semantically equivalent but lexically distinct queries, undermining robustness and reliability. Method: We propose an improved multi-negative ranking loss that explicitly enforces distributional consistency of top-k retrieved documents across semantically equivalent queries, without introducing additional parameters or inference overhead. Contribution/Results: Our approach simultaneously improves both retrieval consistency and accuracyβthe first method to achieve this dual gain. Extensive evaluation on MS-MARCO, Natural Questions, BEIR, and TREC DL demonstrates substantial robustness gains: average NDCG@10 improvements of 1.2β2.8%, and a 15.3β22.7% increase in result overlap across query variants. These results indicate significantly reduced sensitivity to lexical paraphrasing while maintaining or enhancing retrieval effectiveness. The method provides a principled, parameter-free solution for building more robust and trustworthy dense retrieval systems.
π Abstract
Dense Retrieval (DR) models have proven to be effective for Document Retrieval and Information Grounding tasks. Usually, these models are trained and optimized for improving the relevance of top-ranked documents for a given query. Previous work has shown that popular DR models are sensitive to the query and document lexicon: small variations of it may lead to a significant difference in the set of retrieved documents. In this paper, we propose a variation of the Multi-Negative Ranking loss for training DR that improves the coherence of models in retrieving the same documents with respect to semantically similar queries. The loss penalizes discrepancies between the top-k ranked documents retrieved for diverse but semantic equivalent queries. We conducted extensive experiments on various datasets, MS-MARCO, Natural Questions, BEIR, and TREC DL 19/20. The results show that (i) models optimizes by our loss are subject to lower sensitivity, and, (ii) interestingly, higher accuracy.