Multi-Field Adaptive Retrieval

📅 2024-10-26
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the retrieval problem for structured documents—e.g., those containing headings, body text, and HTML tags—as multi-field objects. We propose an adaptive multi-field retrieval framework that decomposes documents into constituent fields and constructs dual-modality (dense and lexical) field-level indices. A query-aware field importance prediction model is introduced to end-to-end learn dynamic, differentiable weights for each field conditioned on the input query, enabling weighted fusion during ranking. Our key contribution is the first query-conditioned, fully learnable field importance adaptation mechanism—eliminating manual weight tuning and supporting arbitrary numbers and types of fields. On benchmark structured document retrieval tasks, our method achieves state-of-the-art performance, significantly outperforming established baselines. Empirical analysis further confirms the complementary optimization benefits of dense and lexical representations across heterogeneous fields.

Technology Category

Application Category

📝 Abstract
Document retrieval for tasks such as search and retrieval-augmented generation typically involves datasets that are unstructured: free-form text without explicit internal structure in each document. However, documents can have a structured form, consisting of fields such as an article title, message body, or HTML header. To address this gap, we introduce Multi-Field Adaptive Retrieval (MFAR), a flexible framework that accommodates any number of and any type of document indices on structured data. Our framework consists of two main steps: (1) the decomposition of an existing document into fields, each indexed independently through dense and lexical methods, and (2) learning a model which adaptively predicts the importance of a field by conditioning on the document query, allowing on-the-fly weighting of the most likely field(s). We find that our approach allows for the optimized use of dense versus lexical representations across field types, significantly improves in document ranking over a number of existing retrievers, and achieves state-of-the-art performance for multi-field structured data.
Problem

Research questions and friction points this paper is trying to address.

Handling structured document retrieval with multiple fields
Optimizing dense and lexical representations across field types
Improving document ranking for multi-field structured data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes documents into independent fields
Adaptively predicts field importance per query
Combines dense and lexical retrieval methods
🔎 Similar Papers
No similar papers found.