๐ค AI Summary
This work addresses the challenges of semantic relevance judgment in knowledge-intensive search, particularly those involving implicit intent understanding, factual equivalence recognition, and fine-grained discrimination. To tackle these issues, the authors propose a three-stage framework: first, knowledge-injected pretraining enhanced by retrieval augmentation; second, hierarchical reasoning alignment to model structured semantic relationships; and third, preference learning on boundary cases to calibrate decision boundaries. The proposed approach consistently outperforms strong large language model (LLM) baselines across multiple real-world search relevance benchmarks, achieving significant and consistent improvements in ranking metrics such as NDCG and MRR. This demonstrates its effectiveness in enabling more accurate, fine-grained, and well-calibrated relevance modeling.
๐ Abstract
Semantic relevance judgment for search is particularly challenging in knowledge-intensive scenarios, where accurate ranking requires not only semantic matching but also background grounding, multi-step reasoning, and well-calibrated decision boundaries. Existing relevance models mainly rely on direct label supervision or shallow semantic similarity, which limits their ability to handle implicit intent, factual equivalence, and fine-grained relevance distinctions. To address this issue, we propose \textsc{RAG-Match}, a three-stage framework that integrates knowledge-augmented pretraining, hierarchical reasoning alignment, and preference-based decision calibration for relevance modeling. The key idea is to first strengthen query-centered semantic grounding, then align the model with structured relevance reasoning, and finally correct decision-level inconsistencies in difficult boundary cases. Experimental results on a real-world search relevance benchmark show that \textsc{RAG-Match} consistently outperforms strong LLM-based baselines across multiple ranking metrics, demonstrating the effectiveness of combining knowledge injection, reasoning supervision, and preference optimization for fine-grained relevance judgment.