Generative Pseudo-Labeling for Pre-Ranking with LLMs

πŸ“… 2026-02-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the sample selection bias inherent in the pre-ranking stage, which relies solely on observed interaction data and severely limits the recommendation performance for long-tail items. To mitigate this issue, the authors propose a novel large language model (LLM)-based approach that offline generates unbiased, content-aware pseudo-labels for user–item pairs to construct user interest anchors. Candidate items are then matched within a frozen semantic embedding space, aligning the training distribution with that of online serving. Notably, this is the first method to leverage LLMs for producing high-quality pseudo-labels, thereby avoiding the mislabeling and bias propagation commonly introduced by conventional negative sampling or knowledge distillation techniques. Deployed in a large-scale production system, the approach achieves a 3.07% increase in click-through rate and significantly enhances recommendation diversity and discovery of long-tail items.

Technology Category

Application Category

πŸ“ Abstract
Pre-ranking is a critical stage in industrial recommendation systems, tasked with efficiently scoring thousands of recalled items for downstream ranking. A key challenge is the train-serving discrepancy: pre-ranking models are trained only on exposed interactions, yet must score all recalled candidates -- including unexposed items -- during online serving. This mismatch not only induces severe sample selection bias but also degrades generalization, especially for long-tail content. Existing debiasing approaches typically rely on heuristics (e.g., negative sampling) or distillation from biased rankers, which either mislabel plausible unexposed items as negatives or propagate exposure bias into pseudo-labels. In this work, we propose Generative Pseudo-Labeling (GPL), a framework that leverages large language models (LLMs) to generate unbiased, content-aware pseudo-labels for unexposed items, explicitly aligning the training distribution with the online serving space. By offline generating user-specific interest anchors and matching them with candidates in a frozen semantic space, GPL provides high-quality supervision without adding online latency. Deployed in a large-scale production system, GPL improves click-through rate by 3.07%, while significantly enhancing recommendation diversity and long-tail item discovery.
Problem

Research questions and friction points this paper is trying to address.

pre-ranking
train-serving discrepancy
sample selection bias
long-tail items
pseudo-labeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative Pseudo-Labeling
Large Language Models
Pre-ranking
Debiasing
Long-tail Recommendation
πŸ”Ž Similar Papers
No similar papers found.