DiffuReason: Bridging Latent Reasoning and Generative Refinement for Sequential Recommendation

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing sequential recommendation methods, which rely on deterministic implicit reasoning chains that are prone to noise accumulation and fail to capture the inherent uncertainty in user intent, while also suffering from suboptimal staged training that hinders joint optimization of reasoning and generation. To overcome these challenges, we propose DiffuReason, a novel framework that introduces diffusion mechanisms into implicit reasoning for the first time, modeling user intent as probabilistic distributions to enable representation denoising. DiffuReason further enhances reasoning capability through multi-step thought tokens and achieves end-to-end joint alignment between reasoning and refinement modules via Group Relative Policy Optimization (GRPO). Extensive experiments on four benchmark datasets demonstrate significant performance gains across diverse backbone models, and large-scale industrial A/B tests confirm its practical effectiveness in real-world recommendation scenarios.

Technology Category

Application Category

📝 Abstract
Latent reasoning has emerged as a promising paradigm for sequential recommendation, enabling models to capture complex user intent through multi-step deliberation. Yet existing approaches often rely on deterministic latent chains that accumulate noise and overlook the uncertainty inherent in user intent, and they are typically trained in staged pipelines that hinder joint optimization and exploration. To address these challenges, we propose DiffuReason, a unified"Think-then-Diffuse"framework for sequential recommendation. It integrates multi-step Thinking Tokens for latent reasoning, diffusion-based refinement for denoising intermediate representations, and end-to-end Group Relative Policy Optimization (GRPO) alignment to optimize for ranking performance. In the Think stage, the model generates Thinking Tokens that reason over user history to form an initial intent hypothesis. In the Diffuse stage, rather than treating this hypothesis as the final output, we refine it through a diffusion process that models user intent as a probabilistic distribution, providing iterative denoising against reasoning noise. Finally, GRPO-based reinforcement learning enables the reasoning and refinement modules to co-evolve throughout training, without the constraints of staged optimization. Extensive experiments on four benchmarks demonstrate that DiffuReason consistently improves diverse backbone architectures. Online A/B tests on a large-scale industrial platform further validate its practical effectiveness.
Problem

Research questions and friction points this paper is trying to address.

latent reasoning
sequential recommendation
uncertainty
staged training
noise accumulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

latent reasoning
diffusion-based refinement
Thinking Tokens
Group Relative Policy Optimization
sequential recommendation
🔎 Similar Papers
No similar papers found.
J
Jie Jiang
Tencent, Beijing, China
Yang Wu
Yang Wu
Tencent
Computer VisionMachine LearningComputer Graphics
Q
Qian Li
Tencent, Beijing, China
Y
Yuling Xiong
Tencent, Beijing, China
Y
Yihang Su
Tencent, Beijing, China
J
Junbang Huo
Tencent, Beijing, China
L
Longfei Lu
Tencent, Beijing, China
Jun Zhang
Jun Zhang
Tencent
AI codecimage/video generationmedical image analysis
H
Huan Yu
Tencent, Beijing, China