Generative Pseudo-Labeling for Pre-Ranking with LLMs

📅 2026-02-24

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the sample selection bias inherent in the pre-ranking stage, which relies solely on observed interaction data and severely limits the recommendation performance for long-tail items. To mitigate this issue, the authors propose a novel large language model (LLM)-based approach that offline generates unbiased, content-aware pseudo-labels for user–item pairs to construct user interest anchors. Candidate items are then matched within a frozen semantic embedding space, aligning the training distribution with that of online serving. Notably, this is the first method to leverage LLMs for producing high-quality pseudo-labels, thereby avoiding the mislabeling and bias propagation commonly introduced by conventional negative sampling or knowledge distillation techniques. Deployed in a large-scale production system, the approach achieves a 3.07% increase in click-through rate and significantly enhances recommendation diversity and discovery of long-tail items.

Technology Category

Application Category

📝 Abstract

Pre-ranking is a critical stage in industrial recommendation systems, tasked with efficiently scoring thousands of recalled items for downstream ranking. A key challenge is the train-serving discrepancy: pre-ranking models are trained only on exposed interactions, yet must score all recalled candidates -- including unexposed items -- during online serving. This mismatch not only induces severe sample selection bias but also degrades generalization, especially for long-tail content. Existing debiasing approaches typically rely on heuristics (e.g., negative sampling) or distillation from biased rankers, which either mislabel plausible unexposed items as negatives or propagate exposure bias into pseudo-labels. In this work, we propose Generative Pseudo-Labeling (GPL), a framework that leverages large language models (LLMs) to generate unbiased, content-aware pseudo-labels for unexposed items, explicitly aligning the training distribution with the online serving space. By offline generating user-specific interest anchors and matching them with candidates in a frozen semantic space, GPL provides high-quality supervision without adding online latency. Deployed in a large-scale production system, GPL improves click-through rate by 3.07%, while significantly enhancing recommendation diversity and long-tail item discovery.

Problem

Research questions and friction points this paper is trying to address.

pre-ranking

train-serving discrepancy

sample selection bias

long-tail items

pseudo-labeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative Pseudo-Labeling

Large Language Models

Pre-ranking