Discourse-Aware Scientific Paper Recommendation via QA-Style Summarization and Multi-Level Contrastive Learning

📅 2025-11-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing scientific paper recommendation methods often neglect document structure, leading to semantic fragmentation and poor interpretability. To address this, we propose a hierarchical structure-aware framework: (1) We first construct question-answer–style structured abstracts using OMRC to explicitly model the logical flow of paper sections; (2) We design a multi-granularity contrastive learning mechanism—operating at section, document, and metadata levels—to achieve cross-level semantic alignment; (3) We introduce a context-calibrated, structure-aware re-ranking module to enhance both recommendation accuracy and interpretability. Extensive experiments on DBLP, S2ORC, and Sci-OMRC demonstrate that our approach significantly outperforms state-of-the-art baselines, achieving absolute improvements of +7.2% in Precision@10 and +3.8% in Recall@10. These results validate the effectiveness of structured document modeling and multi-level contrastive learning for scientific paper recommendation.

Technology Category

Application Category

📝 Abstract
The rapid growth of open-access (OA) publications has intensified the challenge of identifying relevant scientific papers. Due to privacy constraints and limited access to user interaction data, recent efforts have shifted toward content-based recommendation, which relies solely on textual information. However, existing models typically treat papers as unstructured text, neglecting their discourse organization and thereby limiting semantic completeness and interpretability. To address these limitations, we propose OMRC-MR, a hierarchical framework that integrates QA-style OMRC (Objective, Method, Result, Conclusion) summarization, multi-level contrastive learning, and structure-aware re-ranking for scholarly recommendation. The QA-style summarization module converts raw papers into structured and discourse-consistent representations, while multi-level contrastive objectives align semantic representations across metadata, section, and document levels. The final re-ranking stage further refines retrieval precision through contextual similarity calibration. Experiments on DBLP, S2ORC, and the newly constructed Sci-OMRC dataset demonstrate that OMRC-MR consistently surpasses state-of-the-art baselines, achieving up to 7.2% and 3.8% improvements in Precision@10 and Recall@10, respectively. Additional evaluations confirm that QA-style summarization produces more coherent and factually complete representations. Overall, OMRC-MR provides a unified and interpretable content-based paradigm for scientific paper recommendation, advancing trustworthy and privacy-aware scholarly information retrieval.
Problem

Research questions and friction points this paper is trying to address.

Identifying relevant scientific papers from rapidly growing open-access publications
Existing models neglect discourse organization limiting semantic completeness
Content-based recommendation faces privacy constraints and limited user data
Innovation

Methods, ideas, or system contributions that make the work stand out.

QA-style summarization converts papers into structured representations
Multi-level contrastive learning aligns semantic representations across levels
Structure-aware re-ranking refines retrieval through contextual similarity calibration
🔎 Similar Papers
No similar papers found.
S
Shenghua Wang
Institute of Communication Studies, Communication University of China, Beijing, 100024, China
Zhen Yin
Zhen Yin
Senior Research Scientist, Stanford University
Data Science for GeoscienceCritical MineralsDecision Making under Subsurface Uncertainty