Process-Supervised LLM Recommenders via Flow-guided Tuning

📅 2025-03-10

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the popularity bias inherent in supervised fine-tuning (SFT) for large language model (LLM)-based recommendation, which arises from maximizing token-level likelihood. We propose a process-supervision paradigm grounded in Generative Flow Networks (GFlowNets). Our core innovation is the first decomposition of item-level rewards into token-level rewards, enabling fine-grained control over generation paths via token-level reward propagation. To jointly optimize fairness, diversity, and personalization, we integrate empirical distribution matching with proportional sampling. Extensive experiments on multiple benchmark datasets demonstrate that our method consistently outperforms standard SFT across recommendation accuracy, fairness, and diversity. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

While large language models (LLMs) are increasingly adapted for recommendation systems via supervised fine-tuning (SFT), this approach amplifies popularity bias due to its likelihood maximization objective, compromising recommendation diversity and fairness. To address this, we present Flow-guided fine-tuning recommender (Flower), which replaces SFT with a Generative Flow Network (GFlowNet) framework that enacts process supervision through token-level reward propagation. Flower's key innovation lies in decomposing item-level rewards into constituent token rewards, enabling direct alignment between token generation probabilities and their reward signals. This mechanism achieves three critical advancements: (1) popularity bias mitigation and fairness enhancement through empirical distribution matching, (2) preservation of diversity through GFlowNet's proportional sampling, and (3) flexible integration of personalized preferences via adaptable token rewards. Experiments demonstrate Flower's superior distribution-fitting capability and its significant advantages over traditional SFT in terms of fairness, diversity, and accuracy, highlighting its potential to improve LLM-based recommendation systems. The implementation is available via https://github.com/Mr-Peach0301/Flower

Problem

Research questions and friction points this paper is trying to address.

Mitigates popularity bias in LLM-based recommendation systems

Enhances recommendation diversity and fairness using GFlowNet

Aligns token generation probabilities with reward signals

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative Flow Network replaces supervised fine-tuning

Token-level reward propagation enhances fairness and diversity

Empirical distribution matching mitigates popularity bias

🔎 Similar Papers

No similar papers found.