🤖 AI Summary
In natural language recommendation (NLRec), dense retrieval suffers from reliance on a single query embedding and fails to capture multimodal relevance distributions. To address this, we propose a Gaussian process regression (GPR)-based multimodal relevance modeling approach. Leveraging an RBF-kernel GPR, our method integrates fine-grained relevance judgments—generated by large language models (LLMs)—between user queries and item descriptions, explicitly modeling the multimodal (e.g., multi-peaked) distribution of relevance scores. Crucially, the scoring function is trained with only a small number of LLM-generated annotations, ensuring high efficiency and robustness while drastically reducing annotation cost. Experiments across four NLRec benchmarks and two distinct LLMs demonstrate that our method achieves up to a 65% improvement in recommendation performance over both standard dense retrievers and cross-encoders, empirically validating the critical benefit of multimodal relevance modeling for NLRec.
📝 Abstract
Natural Language Recommendation (NLRec) generates item suggestions based on the relevance between user-issued NL requests and NL item description passages. Existing NLRec approaches often use Dense Retrieval (DR) to compute item relevance scores from aggregation of inner products between user request embeddings and relevant passage embeddings. However, DR views the request as the sole relevance label, thus leading to a unimodal scoring function centered on the query embedding that is often a weak proxy for query relevance. To better capture the potential multimodal distribution of the relevance scoring function that may arise from complex NLRec data, we propose GPR-LLM that uses Gaussian Process Regression (GPR) with LLM relevance judgments for a subset of candidate passages. Experiments on four NLRec datasets and two LLM backbones demonstrate that GPR-LLM with an RBF kernel, capable of modeling multimodal relevance scoring functions, consistently outperforms simpler unimodal kernels (dot product, cosine similarity), as well as baseline methods including DR, cross-encoder, and pointwise LLM-based relevance scoring by up to 65%. Overall, GPR-LLM provides an efficient and effective approach to NLRec within a minimal LLM labeling budget.