ChoirRec: Semantic User Grouping via LLMs for Conversion Rate Prediction of Low-Activity Users

๐Ÿ“… 2025-10-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

200K/year
๐Ÿค– AI Summary
To address three key challenges in large-scale e-commerce CVR prediction for low-activity usersโ€”noisy behavioral signals, sparse user interactions, and model bias toward high-activity usersโ€”this paper proposes ChoirRec. Methodologically, it leverages large language models to construct semantic user cohorts, enabling fine-grained semantic clustering; designs a dual-channel cohort-aware architecture that supports cross-user knowledge transfer via hierarchical user representations and multi-granularity feature fusion; and introduces an adaptive fusion mechanism to mitigate signal noise and training bias. Evaluated on the Taobao platform, ChoirRec achieves a 1.16% offline GAUC improvement and a 7.24% online A/B test lift in order volume. The framework significantly enhances both the accuracy and generalizability of CVR prediction for low-activity users.

Technology Category

Application Category

๐Ÿ“ Abstract
Accurately predicting conversion rates (CVR) for low-activity users remains a fundamental challenge in large-scale e-commerce recommender systems.Existing approaches face three critical limitations: (i) reliance on noisy and unreliable behavioral signals; (ii) insufficient user-level information due to the lack of diverse interaction data; and (iii) a systemic training bias toward high-activity users that overshadows the needs of low-activity users.To address these challenges, we propose ChoirRec, a novel framework that leverages the semantic capabilities of Large Language Models (LLMs) to construct semantic user groups and enhance CVR prediction for low-activity users.With a dual-channel architecture designed for robust cross-user knowledge transfer, ChoirRec comprises three components: (i) a Semantic Group Generation module that utilizes LLMs to form reliable, cross-activity user clusters, thereby filtering out noisy signals; (ii) a Group-aware Hierarchical Representation module that enriches sparse user embeddings with informative group-level priors to mitigate data insufficiency; and (iii) a Group-aware Multi-granularity Modual that employs a dual-channel architecture and adaptive fusion mechanism to ensure effective learning and utilization of group knowledge. We conduct extensive offline and online experiments on Taobao, a leading industrial-scale e-commerce platform.ChoirRec improves GAUC by 1.16% in offline evaluations, while online A/B testing reveals a 7.24% increase in order volume, highlighting its substantial practical value in real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Predicting conversion rates for low-activity e-commerce users
Overcoming noisy behavioral signals and insufficient user data
Addressing systemic training bias toward high-activity users
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages LLMs to create semantic user groups
Uses dual-channel architecture for cross-user knowledge transfer
Enriches sparse embeddings with group-level priors