🤖 AI Summary
Existing scientific literature retrieval methods struggle to capture fine-grained scientific concepts due to reliance on holistic text embeddings and insufficient domain-specific semantic understanding. While LLM-based entity extraction approaches exist, they typically treat entities as isolated tokens, neglecting their rich, multi-faceted semantic structures. To address this, we propose PairSem—a novel framework that introduces *entity-aspect pairs* as the fundamental unit for modeling scientific concepts, explicitly representing their multidimensional semantics. PairSem leverages LLMs in an unsupervised manner to jointly extract scientific entities and their associated aspects (e.g., methods, materials, tasks), enabling fine-grained semantic matching without labeled data and offering plug-and-play compatibility with arbitrary retrievers. Extensive experiments across multiple benchmark datasets and state-of-the-art retrieval models demonstrate significant improvements in retrieval accuracy, validating both the effectiveness and generalizability of multidimensional semantic modeling.
📝 Abstract
Scientific document retrieval is a critical task for enabling knowledge discovery and supporting research across diverse domains. However, existing dense retrieval methods often struggle to capture fine-grained scientific concepts in texts due to their reliance on holistic embeddings and limited domain understanding. Recent approaches leverage large language models (LLMs) to extract fine-grained semantic entities and enhance semantic matching, but they typically treat entities as independent fragments, overlooking the multi-faceted nature of scientific concepts. To address this limitation, we propose Pairwise Semantic Matching (PairSem), a framework that represents relevant semantics as entity-aspect pairs, capturing complex, multi-faceted scientific concepts. PairSem is unsupervised, base retriever-agnostic, and plug-and-play, enabling precise and context-aware matching without requiring query-document labels or entity annotations. Extensive experiments on multiple datasets and retrievers demonstrate that PairSem significantly improves retrieval performance, highlighting the importance of modeling multi-aspect semantics in scientific information retrieval.