π€ AI Summary
This work addresses the challenges in generative recommender systems arising from quantization-induced ID collisions and information loss, as well as the mismatch between input representations and output generation in terms of granularity and structure. To this end, the authors propose ComeIR, a novel framework that introduces a conditional memory mechanism featuring a two-level Engram memory module to reconstruct semantic identifier (SID) embeddings. This design preserves structural evidence of SIDs while enhancing item-level representations. During decoding, ComeIR integrates a memory-augmented prediction head with multimodal-guided token scoring to recover token-level granularity and improve generation accuracy. Experimental results demonstrate that ComeIR effectively mitigates the tension between identity preservation and structural fidelity, resolves inputβoutput granularity misalignment, and exhibits consistent performance gains as the scale of the conditional memory increases.
π Abstract
Generative recommendation (GR) has emerged as a promising paradigm that predicts target items by autoregressively generating their semantic identifiers (SID). Most GR methods follow a quantization-representation-generation pipeline, first assigning each item a SID, then constructing input representations from SID-token embeddings, and finally predicting the target SID through autoregressive generation. Existing item-level representation constructions mainly take two forms: directly merging SID-token embeddings into a compact vector, or enriching item-level representations with external inputs through additional networks. However, these item-level constructors still expose two practical challenges: direct merging may amplify the information loss caused by quantization and ID collision while obscuring SID code relations, whereas external-input-based methods can strengthen item semantics but cannot reliably preserve the SID-structured evidence required for token-level generation. These limitations make representation construction an underexplored bottleneck, leading to two severe problems, \ie{} the Identity-Structure Preservation Conflict and Input-Output Granularity Mismatch. To this end, we propose ComeIR, a Conditional Memory enhanced Item Representation framework that reconstructs SID-token embeddings into item-aware inputs and restores the token granularity during SID decoding. Specifically, MM-guided token scoring adaptively estimates the contribution of each code within the SID, dual-level Engram memory captures intra-item code composition and inter-item transition patterns, and a memory-restoring prediction head reuses the memories during SID decoding. Extensive experiments demonstrate the effectiveness and flexibility of ComeIR, and further reveal scalable gains from enlarging conditional memory.