Learning User Interests via Reasoning and Distillation for Cross-Domain News Recommendation

πŸ“… 2026-02-16
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of cross-domain news recommendation, where capturing deep user interest signals from heterogeneous behaviors while ensuring system scalability remains difficult. The authors propose a novel framework that integrates reinforcement learning with large language models, introducing a multi-reward policy optimization approach to generate high-quality, interest-driven news search queries for modeling cross-domain user interests. To enable efficient deployment, they employ online policy distillation to transfer the large model’s policy to a lightweight student model. Extensive offline experiments, ablation studies, and large-scale online A/B tests demonstrate that the proposed method significantly improves interest modeling fidelity and recommendation performance, with gains scaling consistently with increased sampling size and model capacity.

Technology Category

Application Category

πŸ“ Abstract
News recommendation plays a critical role in online news platforms by helping users discover relevant content. Cross-domain news recommendation further requires inferring user's underlying information needs from heterogeneous signals that often extend beyond direct news consumption. A key challenge lies in moving beyond surface-level behaviors to capture deeper, reusable user interests while maintaining scalability in large-scale production systems. In this paper, we present a reinforcement learning framework that trains large language models to generate high-quality lists of interest-driven news search queries from cross-domain user signals. We formulate query-list generation as a policy optimization problem and employ GRPO with multiple reward signals. We systematically study two compute dimensions: inference-time sampling and model capacity, and empirically observe consistent improvements with increased compute that exhibit scaling-like behavior. Finally, we perform on-policy distillation to transfer the learned policy from a large, compute-intensive teacher to a compact student model suitable for scalable deployment. Extensive offline experiments, ablation studies and large-scale online A/B tests in a production news recommendation system demonstrate consistent gains in both interest modeling quality and downstream recommendation performance.
Problem

Research questions and friction points this paper is trying to address.

cross-domain news recommendation
user interests
scalability
interest modeling
heterogeneous signals
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-domain recommendation
large language models
reinforcement learning
policy distillation
query generation
πŸ”Ž Similar Papers
No similar papers found.
M
Mengdan Zhu
Microsoft, Emory University
Y
Yufan Zhao
Microsoft
T
Tao Di
Microsoft
Y
Yulan Yan
Microsoft
Liang Zhao
Liang Zhao
Winship Distinguished Professor&Associate Professor, Emory University
data miningmachine learningspatial data mininggraph neural networksgenerative AI