Multimodal Item Scoring for Natural Language Recommendation via Gaussian Process Regression with LLM Relevance Judgments

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In natural language recommendation (NLRec), dense retrieval suffers from reliance on a single query embedding and fails to capture multimodal relevance distributions. To address this, we propose a Gaussian process regression (GPR)-based multimodal relevance modeling approach. Leveraging an RBF-kernel GPR, our method integrates fine-grained relevance judgments—generated by large language models (LLMs)—between user queries and item descriptions, explicitly modeling the multimodal (e.g., multi-peaked) distribution of relevance scores. Crucially, the scoring function is trained with only a small number of LLM-generated annotations, ensuring high efficiency and robustness while drastically reducing annotation cost. Experiments across four NLRec benchmarks and two distinct LLMs demonstrate that our method achieves up to a 65% improvement in recommendation performance over both standard dense retrievers and cross-encoders, empirically validating the critical benefit of multimodal relevance modeling for NLRec.

Technology Category

Application Category

📝 Abstract
Natural Language Recommendation (NLRec) generates item suggestions based on the relevance between user-issued NL requests and NL item description passages. Existing NLRec approaches often use Dense Retrieval (DR) to compute item relevance scores from aggregation of inner products between user request embeddings and relevant passage embeddings. However, DR views the request as the sole relevance label, thus leading to a unimodal scoring function centered on the query embedding that is often a weak proxy for query relevance. To better capture the potential multimodal distribution of the relevance scoring function that may arise from complex NLRec data, we propose GPR-LLM that uses Gaussian Process Regression (GPR) with LLM relevance judgments for a subset of candidate passages. Experiments on four NLRec datasets and two LLM backbones demonstrate that GPR-LLM with an RBF kernel, capable of modeling multimodal relevance scoring functions, consistently outperforms simpler unimodal kernels (dot product, cosine similarity), as well as baseline methods including DR, cross-encoder, and pointwise LLM-based relevance scoring by up to 65%. Overall, GPR-LLM provides an efficient and effective approach to NLRec within a minimal LLM labeling budget.
Problem

Research questions and friction points this paper is trying to address.

Modeling multimodal relevance scoring in natural language recommendation
Overcoming unimodal limitations of dense retrieval methods
Improving recommendation accuracy with minimal LLM labeling budget
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Gaussian Process Regression for scoring
Incorporates LLM relevance judgments on passages
Models multimodal relevance with RBF kernel
🔎 Similar Papers
No similar papers found.