PRISM: Fine-Grained Paper-to-Paper Retrieval with Multi-Aspect-Aware Query Optimization

📅 2025-07-14

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

Current scientific paper retrieval predominantly relies on abstract-based modeling, which fails to capture multi-dimensional semantics across full texts, resulting in insufficient fine-grained matching capability. To address this, we propose a full-text segment-aware, multi-perspective document retrieval framework: it decomposes query papers into semantic views (e.g., method, experiment, conclusion) and performs aspect-aligned, fine-grained matching against corresponding sections of candidate papers; further, it integrates dense retrieval with paragraph-level semantic modeling via a multi-perspective query optimization mechanism and aspect-specific embedding strategy. We introduce SciFullBench—the first benchmark supporting full-text segment-level contrastive evaluation. On SciFullBench, our approach achieves an average 4.3% improvement in NDCG@10 over state-of-the-art baselines, significantly enhancing the discovery of deep cross-paper semantic associations.

Technology Category

Application Category

📝 Abstract

Scientific paper retrieval, particularly framed as document-to-document retrieval, aims to identify relevant papers in response to a long-form query paper, rather than a short query string. Previous approaches to this task have focused on abstracts, embedding them into dense vectors as surrogates for full documents and calculating similarity across them, although abstracts provide only sparse and high-level summaries. To address this, we propose PRISM, a novel document-to-document retrieval method that introduces multiple, fine-grained representations for both the query and candidate papers. In particular, each query paper is decomposed into multiple aspect-specific views and individually embedded, which are then matched against candidate papers similarity segmented to consider their multifaceted dimensions. Moreover, we present SciFullBench, a novel benchmark in which the complete and segmented context of full papers for both queries and candidates is available. Then, experimental results show that PRISM improves performance by an average of 4.3% over existing retrieval baselines.

Problem

Research questions and friction points this paper is trying to address.

Enhances paper retrieval using multi-aspect fine-grained representations

Addresses limitations of abstract-only dense vector embeddings

Introduces segmented matching for multifaceted document dimensions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained multi-aspect paper representations

Aspect-specific embedding and matching

Novel benchmark with full paper context

🔎 Similar Papers

Chain-of-Factors Paper-Reviewer Matching