The Wisdom of Many Queries: Complexity-Diversity Principle for Dense Retriever Training

๐Ÿ“… 2026-02-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the inconsistent effectiveness of synthetic query diversity in dense retrieval, particularly the lack of guiding principles in multi-hop settings. The authors propose the Complexity-Diversity Principle (CDP), which systematically revealsโ€”for the first timeโ€”a strong correlation (correlation coefficient > 0.95, p < 0.05) between query complexity and the benefits of diversity. They introduce a Q-D metric to quantify diversity impact and establish an operational complexity threshold based on the number of content words to guide zero-shot multi-query synthesis. Extensive experiments across 31 datasets demonstrate that CDP significantly enhances multi-hop retrieval performance, achieving state-of-the-art results.

Technology Category

Application Category

๐Ÿ“ Abstract
Prior work reports conflicting results on query diversity in synthetic data generation for dense retrieval. We identify this conflict and design Q-D metrics to quantify diversity's impact, making the problem measurable. Through experiments on 4 benchmark types (31 datasets), we find query diversity especially benefits multi-hop retrieval. Deep analysis on multi-hop data reveals that diversity benefit correlates strongly with query complexity ($r$$\geq$0.95, $p$$<$0.05 in 12/14 conditions), measured by content words (CW). We formalize this as the Complexity-Diversity Principle (CDP): query complexity determines optimal diversity. CDP provides actionable thresholds (CW$>$10: use diversity; CW$<$7: avoid it). Guided by CDP, we propose zero-shot multi-query synthesis for multi-hop tasks, achieving state-of-the-art performance.
Problem

Research questions and friction points this paper is trying to address.

query diversity
dense retrieval
synthetic data generation
multi-hop retrieval
query complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Complexity-Diversity Principle
query diversity
dense retrieval
multi-hop retrieval
zero-shot query synthesis
๐Ÿ”Ž Similar Papers
No similar papers found.