Discourse-Aware Scientific Paper Recommendation via QA-Style Summarization and Multi-Level Contrastive Learning

📅 2025-11-05

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Existing scientific paper recommendation methods often neglect document structure, leading to semantic fragmentation and poor interpretability. To address this, we propose a hierarchical structure-aware framework: (1) We first construct question-answer–style structured abstracts using OMRC to explicitly model the logical flow of paper sections; (2) We design a multi-granularity contrastive learning mechanism—operating at section, document, and metadata levels—to achieve cross-level semantic alignment; (3) We introduce a context-calibrated, structure-aware re-ranking module to enhance both recommendation accuracy and interpretability. Extensive experiments on DBLP, S2ORC, and Sci-OMRC demonstrate that our approach significantly outperforms state-of-the-art baselines, achieving absolute improvements of +7.2% in Precision@10 and +3.8% in Recall@10. These results validate the effectiveness of structured document modeling and multi-level contrastive learning for scientific paper recommendation.

Technology Category

Application Category

📝 Abstract

The rapid growth of open-access (OA) publications has intensified the challenge of identifying relevant scientific papers. Due to privacy constraints and limited access to user interaction data, recent efforts have shifted toward content-based recommendation, which relies solely on textual information. However, existing models typically treat papers as unstructured text, neglecting their discourse organization and thereby limiting semantic completeness and interpretability. To address these limitations, we propose OMRC-MR, a hierarchical framework that integrates QA-style OMRC (Objective, Method, Result, Conclusion) summarization, multi-level contrastive learning, and structure-aware re-ranking for scholarly recommendation. The QA-style summarization module converts raw papers into structured and discourse-consistent representations, while multi-level contrastive objectives align semantic representations across metadata, section, and document levels. The final re-ranking stage further refines retrieval precision through contextual similarity calibration. Experiments on DBLP, S2ORC, and the newly constructed Sci-OMRC dataset demonstrate that OMRC-MR consistently surpasses state-of-the-art baselines, achieving up to 7.2% and 3.8% improvements in Precision@10 and Recall@10, respectively. Additional evaluations confirm that QA-style summarization produces more coherent and factually complete representations. Overall, OMRC-MR provides a unified and interpretable content-based paradigm for scientific paper recommendation, advancing trustworthy and privacy-aware scholarly information retrieval.

Problem

Research questions and friction points this paper is trying to address.

Identifying relevant scientific papers from rapidly growing open-access publications

Existing models neglect discourse organization limiting semantic completeness

Content-based recommendation faces privacy constraints and limited user data

Innovation

Methods, ideas, or system contributions that make the work stand out.

QA-style summarization converts papers into structured representations

Multi-level contrastive learning aligns semantic representations across levels

Structure-aware re-ranking refines retrieval through contextual similarity calibration

🔎 Similar Papers

Rethinking Scientific Summarization Evaluation: Grounding Explainable Metrics on Facet-aware Benchmark