LLM-as-a-Judge: Toward World Models for Slate Recommendation Systems

📅 2025-11-06

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This study addresses the challenge of cross-domain user preference modeling in multi-domain slate recommendation. Methodologically, it pioneers the use of large language models (LLMs) as “world models” of user preferences, explicitly encoding preference functions via LLMs’ pairwise comparison capability—without fine-tuning—enabling zero-shot generalization across multiple tasks and datasets. Key contributions are: (1) establishing LLMs as a novel paradigm for universal preference world modeling; (2) empirically validating effectiveness on multiple slate recommendation benchmarks, and revealing correlations between performance and intrinsic preference properties—such as smoothness and transitivity; and (3) identifying alignment between prompt design and underlying preference structure as a critical optimization axis. Results demonstrate substantial improvements in both interpretability and cross-domain generalization for sequential slate recommendation.

Technology Category

Application Category

📝 Abstract

Modeling user preferences across domains remains a key challenge in slate recommendation (i.e. recommending an ordered sequence of items) research. We investigate how Large Language Models (LLM) can effectively act as world models of user preferences through pairwise reasoning over slates. We conduct an empirical study involving several LLMs on three tasks spanning different datasets. Our results reveal relationships between task performance and properties of the preference function captured by LLMs, hinting towards areas for improvement and highlighting the potential of LLMs as world models in recommender systems.

Problem

Research questions and friction points this paper is trying to address.

Modeling user preferences across domains in slate recommendation

Using LLMs as world models through pairwise reasoning

Investigating task performance relationships with preference function properties

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs act as world models for user preferences

Using pairwise reasoning over recommendation slates

Empirical study across multiple tasks and datasets

🔎 Similar Papers

Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application