ChoirRec: Semantic User Grouping via LLMs for Conversion Rate Prediction of Low-Activity Users

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address three key challenges in large-scale e-commerce CVR prediction for low-activity users—noisy behavioral signals, sparse user interactions, and model bias toward high-activity users—this paper proposes ChoirRec. Methodologically, it leverages large language models to construct semantic user cohorts, enabling fine-grained semantic clustering; designs a dual-channel cohort-aware architecture that supports cross-user knowledge transfer via hierarchical user representations and multi-granularity feature fusion; and introduces an adaptive fusion mechanism to mitigate signal noise and training bias. Evaluated on the Taobao platform, ChoirRec achieves a 1.16% offline GAUC improvement and a 7.24% online A/B test lift in order volume. The framework significantly enhances both the accuracy and generalizability of CVR prediction for low-activity users.

Technology Category

Application Category

📝 Abstract
Accurately predicting conversion rates (CVR) for low-activity users remains a fundamental challenge in large-scale e-commerce recommender systems.Existing approaches face three critical limitations: (i) reliance on noisy and unreliable behavioral signals; (ii) insufficient user-level information due to the lack of diverse interaction data; and (iii) a systemic training bias toward high-activity users that overshadows the needs of low-activity users.To address these challenges, we propose ChoirRec, a novel framework that leverages the semantic capabilities of Large Language Models (LLMs) to construct semantic user groups and enhance CVR prediction for low-activity users.With a dual-channel architecture designed for robust cross-user knowledge transfer, ChoirRec comprises three components: (i) a Semantic Group Generation module that utilizes LLMs to form reliable, cross-activity user clusters, thereby filtering out noisy signals; (ii) a Group-aware Hierarchical Representation module that enriches sparse user embeddings with informative group-level priors to mitigate data insufficiency; and (iii) a Group-aware Multi-granularity Modual that employs a dual-channel architecture and adaptive fusion mechanism to ensure effective learning and utilization of group knowledge. We conduct extensive offline and online experiments on Taobao, a leading industrial-scale e-commerce platform.ChoirRec improves GAUC by 1.16% in offline evaluations, while online A/B testing reveals a 7.24% increase in order volume, highlighting its substantial practical value in real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Predicting conversion rates for low-activity e-commerce users
Overcoming noisy behavioral signals and insufficient user data
Addressing systemic training bias toward high-activity users
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages LLMs to create semantic user groups
Uses dual-channel architecture for cross-user knowledge transfer
Enriches sparse embeddings with group-level priors
🔎 Similar Papers
No similar papers found.
D
Dakai Zhai
Alibaba Group, Hangzhou, China
J
Jiong Gao
Alibaba Group, Hangzhou, China
B
Boya Du
Alibaba Group, Hangzhou, China
Junwei Xu
Junwei Xu
SIGS, Tsinghua University, Shenzhen, China
Qijie Shen
Qijie Shen
Alibaba Group
Recommender SystemGraph Neural NetworksLarge Language Model
Jialin Zhu
Jialin Zhu
Researcher in Deep Learning, Computer Graphics and Computer Vision
Deep LearningComputer GraphicsComputer Vision
Y
Yuning Jiang
Alibaba Group, Hangzhou, China