🤖 AI Summary
This work addresses the limitation of existing large language model (LLM)-based recommendation approaches, which typically rely on a single latent vector to represent user intent and thus fail to capture the multifaceted nature of user preferences. To overcome this, we propose FLR, a factorized disentangled latent reasoning framework that introduces, for the first time in LLM-based recommendation, a multi-factor attention mechanism. This mechanism decomposes implicit user reasoning into multiple independent preference factors, which are disentangled through orthogonality, sparsity, and attention diversity regularization. Furthermore, we design a group-relative policy optimization strategy within a reinforcement learning paradigm to align the semantic meanings of these factors in latent space. Extensive experiments demonstrate that FLR significantly outperforms strong baselines across multiple benchmark datasets while simultaneously enhancing recommendation robustness and interpretability.
📝 Abstract
Large language models (LLMs) have recently been adopted for recommendation by framing user preference modeling as a language generation problem. However, existing latent reasoning approaches typically represent user intent with a single latent vector, which struggles to capture the inherently multi-faceted nature of user preferences. We propose Factorized Latent Reasoning (FLR), a novel framework for LLM-based sequential recommendation that decomposes latent reasoning into multiple disentangled preference factors. FLR introduces a lightweight multi-factor attention module that iteratively refines a latent thought representation, where each factor attends to distinct aspects of the user's interaction history. To encourage diversity and specialization, we design orthogonality, attention diversity, and sparsity regularization objectives, and dynamically aggregate factor contributions for the final prediction. We further integrate FLR with an efficient reinforcement learning strategy based on group-relative policy optimization, enabling stable alignment directly in the latent reasoning space. Experiments on multiple benchmarks show that FLR consistently outperforms strong baselines while improving robustness and interpretability.