UniFAR: A Unified Facet-Aware Retrieval Framework for Scientific Documents

📅 2026-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the mismatch in input granularity, semantic focus, and training signals between document-to-document (doc-doc) and query-to-document (q-doc) paradigms in scientific document retrieval. To bridge this gap, the authors propose a unified fine-grained aware retrieval framework that jointly supports both retrieval tasks within a single architecture for the first time. The framework introduces learnable aspect anchors to align the structural elements of scientific documents with user query intents, and employs adaptive multi-granularity aggregation coupled with multi-task joint training for end-to-end optimization. Compatible with various foundation models, the approach achieves significant performance gains over existing methods across multiple scientific retrieval benchmarks, demonstrating its effectiveness and generalizability.

Technology Category

Application Category

📝 Abstract
Existing scientific document retrieval (SDR) methods primarily rely on document-centric representations learned from inter-document relationships for document-document (doc-doc) retrieval. However, the rise of LLMs and RAG has shifted SDR toward question-driven retrieval, where documents are retrieved in response to natural-language questions (q-doc). This change has led to systematic mismatches between document-centric models and question-driven retrieval, including (1) input granularity (long documents vs. short questions), (2) semantic focus (scientific discourse structure vs. specific question intent), and (3) training signals (citation-based similarity vs. question-oriented relevance). To this end, we propose UniFAR, a Unified Facet-Aware Retrieval framework to jointly support doc-doc and q-doc SDR within a single architecture. UniFAR reconciles granularity differences through adaptive multi-granularity aggregation, aligns document structure with question intent via learnable facet anchors, and unifies doc-doc and q-doc supervision through joint training. Experimental results show that UniFAR consistently outperforms prior methods across multiple retrieval tasks and base models, confirming its effectiveness and generality.
Problem

Research questions and friction points this paper is trying to address.

scientific document retrieval
question-driven retrieval
document-centric representation
semantic mismatch
retrieval granularity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified Retrieval Framework
Facet-Aware Representation
Multi-Granularity Aggregation
Question-Driven Retrieval
Joint Training
🔎 Similar Papers
No similar papers found.
Z
Zheng Dou
School of Computer Science and Engineering, Beihang University, Beijing, China
Z
Zhao Zhang
School of Computer Science and Engineering, Beihang University, Beijing, China
D
Deqing Wang
School of Computer Science and Engineering, Beihang University, Beijing, China
Yikun Ban
Yikun Ban
Beihang University, University of Illinois Urbana-Champaign
Reinforcement LearningEnsemble Learning
F
Fuzhen Zhuang
School of Artificial Intelligence, Beihang University, Beijing, China