LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the high latency of token-by-token generation in large language models for generative recommendation, as well as key limitations of latent-space reasoning—including semantic misalignment, representation drift, and fixed inference depth—by proposing the LASAR framework. LASAR introduces adaptive latent-space reasoning into generative recommendation through a two-stage training process: first aligning semantic IDs, then performing latent-space reasoning. It integrates explicit chain-of-thought semantic alignment, stepwise bidirectional KL regularization, and a sample-level dynamic inference depth prediction mechanism, trained via an SFT-then-RL paradigm with GRPO/REINFORCE-based policy optimization. Experiments on three real-world datasets demonstrate that LASAR significantly outperforms baseline methods, achieving approximately 20× faster inference than explicit chain-of-thought approaches, nearly halving the average number of latent reasoning steps, and incurring only minimal additional latency overhead.

📝 Abstract

Large Language Models (LLMs) have demonstrated powerful reasoning capabilities through Chain-of-Thought (CoT) in various tasks, yet the inefficiency of token-by-token generation hinders real-world deployment in latency-sensitive recommender systems. Latent reasoning has emerged as an effective paradigm in LLMs, performing multi-step inference in a continuous hidden-state space to achieve stronger reasoning at lower cost. However, this paradigm remains underexplored in mainstream generative recommendation. Adapting it reveals three unique challenges: (1) the gap between prior-less Semantic ID (SID) symbols and continuous latent reasoning - SIDs lack pre-trained semantics, hindering joint optimization; (2) representation drift due to a lack of reasoning chain supervision; and (3) the suboptimality of applying a globally fixed reasoning depth. To address these, we propose LASAR (Latent Adaptive Semantic Aligned Reasoning), an SFT-then-RL framework. First, we bridge this gap via two-stage training: Stage 1 grounds SID semantics before Stage 2 introduces latent reasoning, ensuring efficient convergence. Second, we mitigate representation drift through explicit CoT semantic alignment. Step-wise bidirectional KL divergence constrains the latent reasoning trajectory using hidden-state anchors extracted from CoT text, while a Policy Head predicts per-sample reasoning depth. Third, during the GRPO-based RL phase, terminal-only KL alignment accommodates variable-length reasoning, and REINFORCE optimizes the Policy Head to dynamically allocate steps. This nearly halves the average latent step count while simultaneously improving recommendation quality. Experiments on three real-world datasets demonstrate that LASAR outperforms all baselines. It adds marginal inference latency and is roughly 20 times faster than generating explicit CoT text.

Problem

Research questions and friction points this paper is trying to address.

latent reasoning

generative recommendation

semantic alignment

representation drift

reasoning depth

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent Reasoning

Semantic Alignment

Adaptive Reasoning Depth