DiffuReason: Bridging Latent Reasoning and Generative Refinement for Sequential Recommendation

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work addresses the limitations of existing sequential recommendation methods, which rely on deterministic implicit reasoning chains that are prone to noise accumulation and fail to capture the inherent uncertainty in user intent, while also suffering from suboptimal staged training that hinders joint optimization of reasoning and generation. To overcome these challenges, we propose DiffuReason, a novel framework that introduces diffusion mechanisms into implicit reasoning for the first time, modeling user intent as probabilistic distributions to enable representation denoising. DiffuReason further enhances reasoning capability through multi-step thought tokens and achieves end-to-end joint alignment between reasoning and refinement modules via Group Relative Policy Optimization (GRPO). Extensive experiments on four benchmark datasets demonstrate significant performance gains across diverse backbone models, and large-scale industrial A/B tests confirm its practical effectiveness in real-world recommendation scenarios.

Technology Category

Application Category

📝 Abstract

Latent reasoning has emerged as a promising paradigm for sequential recommendation, enabling models to capture complex user intent through multi-step deliberation. Yet existing approaches often rely on deterministic latent chains that accumulate noise and overlook the uncertainty inherent in user intent, and they are typically trained in staged pipelines that hinder joint optimization and exploration. To address these challenges, we propose DiffuReason, a unified"Think-then-Diffuse"framework for sequential recommendation. It integrates multi-step Thinking Tokens for latent reasoning, diffusion-based refinement for denoising intermediate representations, and end-to-end Group Relative Policy Optimization (GRPO) alignment to optimize for ranking performance. In the Think stage, the model generates Thinking Tokens that reason over user history to form an initial intent hypothesis. In the Diffuse stage, rather than treating this hypothesis as the final output, we refine it through a diffusion process that models user intent as a probabilistic distribution, providing iterative denoising against reasoning noise. Finally, GRPO-based reinforcement learning enables the reasoning and refinement modules to co-evolve throughout training, without the constraints of staged optimization. Extensive experiments on four benchmarks demonstrate that DiffuReason consistently improves diverse backbone architectures. Online A/B tests on a large-scale industrial platform further validate its practical effectiveness.

Problem

Research questions and friction points this paper is trying to address.

latent reasoning

sequential recommendation

uncertainty

staged training

noise accumulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

latent reasoning

diffusion-based refinement

Thinking Tokens