SciNUP: Natural Language User Interest Profiles for Scientific Literature Recommendation

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The lack of large-scale, publicly available natural language (NL) user interest profiling testbeds hinders research on explainable scientific literature recommendation. To address this, we introduce SynSciRec—the first synthetic academic recommendation dataset explicitly designed for NL-based user profiling—generated automatically from real publication histories of researchers to yield high-quality, controllable NL interest descriptions. Methodologically, we integrate sparse retrieval, dense retrieval, and LLM-driven re-ranking to systematically evaluate diverse recommendation paradigms. Experimental results show that state-of-the-art methods achieve comparable overall performance yet produce complementary recommendations, underscoring the necessity of ensemble strategies. The dataset is publicly released, establishing a benchmark platform and foundational resource for explainable recommendation research.

Technology Category

Application Category

📝 Abstract
The use of natural language (NL) user profiles in recommender systems offers greater transparency and user control compared to traditional representations. However, there is scarcity of large-scale, publicly available test collections for evaluating NL profile-based recommendation. To address this gap, we introduce SciNUP, a novel synthetic dataset for scholarly recommendation that leverages authors' publication histories to generate NL profiles and corresponding ground truth items. We use this dataset to conduct a comparison of baseline methods, ranging from sparse and dense retrieval approaches to state-of-the-art LLM-based rerankers. Our results show that while baseline methods achieve comparable performance, they often retrieve different items, indicating complementary behaviors. At the same time, considerable headroom for improvement remains, highlighting the need for effective NL-based recommendation approaches. The SciNUP dataset thus serves as a valuable resource for fostering future research and development in this area.
Problem

Research questions and friction points this paper is trying to address.

Addressing scarcity of test collections for NL profile recommendation
Evaluating baseline methods for scholarly literature recommendation
Providing dataset to foster NL-based recommendation research
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic dataset generation from author publication histories
Comparison of sparse dense retrieval and LLM rerankers
Identified complementary behaviors and improvement headroom