User Feedback Alignment for LLM-powered Exploration in Large-scale Recommendation Systems

📅 2025-04-07

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

To address the problem of feedback loops in exploratory recommendation and interest boundary ossification caused by sparse user exploration signals in large-scale recommender systems, this paper proposes a dual-LLM framework integrating hierarchical planning with inference-time scaling. The framework innovatively decouples the objectives of “novelty” and “user alignment”: an upper-level LLM performs hierarchical task planning to broaden users’ interest boundaries, while a lower-level LLM achieves precise alignment via a best-of-n sampling mechanism augmented with inference-time scaling. By preserving LLMs’ knowledge and reasoning capabilities, the method significantly mitigates preference drift and signal bias. Online A/B testing demonstrates substantial improvements: +12.7% in user watch time, +8.3% in active users, alongside gains in recommendation diversity (+19.5%), exploration depth (+24.1%), and user satisfaction (+15.6%).

Technology Category

Application Category

📝 Abstract

Exploration, the act of broadening user experiences beyond their established preferences, is challenging in large-scale recommendation systems due to feedback loops and limited signals on user exploration patterns. Large Language Models (LLMs) offer potential by leveraging their world knowledge to recommend novel content outside these loops. A key challenge is aligning LLMs with user preferences while preserving their knowledge and reasoning. While using LLMs to plan for the next novel user interest, this paper introduces a novel approach combining hierarchical planning with LLM inference-time scaling to improve recommendation relevancy without compromising novelty. We decouple novelty and user-alignment, training separate LLMs for each objective. We then scale up the novelty-focused LLM's inference and select the best-of-n predictions using the user-aligned LLM. Live experiments demonstrate efficacy, showing significant gains in both user satisfaction (measured by watch activity and active user counts) and exploration diversity.

Problem

Research questions and friction points this paper is trying to address.

Aligning LLMs with user preferences in recommendations

Balancing novelty and relevancy in large-scale systems

Decoupling training objectives for exploration and alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical planning with LLM inference-time scaling

Decoupling novelty and user-alignment via separate LLMs

Best-of-n predictions using user-aligned LLM selection

🔎 Similar Papers

Direct Preference Optimization for LLM-Enhanced Recommendation Systems