UniFAR: A Unified Facet-Aware Retrieval Framework for Scientific Documents

📅 2026-02-27

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the mismatch in input granularity, semantic focus, and training signals between document-to-document (doc-doc) and query-to-document (q-doc) paradigms in scientific document retrieval. To bridge this gap, the authors propose a unified fine-grained aware retrieval framework that jointly supports both retrieval tasks within a single architecture for the first time. The framework introduces learnable aspect anchors to align the structural elements of scientific documents with user query intents, and employs adaptive multi-granularity aggregation coupled with multi-task joint training for end-to-end optimization. Compatible with various foundation models, the approach achieves significant performance gains over existing methods across multiple scientific retrieval benchmarks, demonstrating its effectiveness and generalizability.

Technology Category

Application Category

📝 Abstract

Existing scientific document retrieval (SDR) methods primarily rely on document-centric representations learned from inter-document relationships for document-document (doc-doc) retrieval. However, the rise of LLMs and RAG has shifted SDR toward question-driven retrieval, where documents are retrieved in response to natural-language questions (q-doc). This change has led to systematic mismatches between document-centric models and question-driven retrieval, including (1) input granularity (long documents vs. short questions), (2) semantic focus (scientific discourse structure vs. specific question intent), and (3) training signals (citation-based similarity vs. question-oriented relevance). To this end, we propose UniFAR, a Unified Facet-Aware Retrieval framework to jointly support doc-doc and q-doc SDR within a single architecture. UniFAR reconciles granularity differences through adaptive multi-granularity aggregation, aligns document structure with question intent via learnable facet anchors, and unifies doc-doc and q-doc supervision through joint training. Experimental results show that UniFAR consistently outperforms prior methods across multiple retrieval tasks and base models, confirming its effectiveness and generality.

Problem

Research questions and friction points this paper is trying to address.

scientific document retrieval

question-driven retrieval

document-centric representation

semantic mismatch

retrieval granularity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified Retrieval Framework

Facet-Aware Representation

Multi-Granularity Aggregation