🤖 AI Summary
This work identifies and addresses the semantic dilution effect caused by high-granularity semantic IDs in generative recommendation, which leads to substantially increased training costs and performance instability. The study reveals, for the first time, that this effect is the common root of both efficiency and stability challenges. To mitigate it, the authors propose the STAMP framework, which synergistically optimizes input purification and supervision signal enhancement: at the input end, Semantic Adaptive Pruning (SAP) compresses redundant sequences, while at the output end, Multi-step Auxiliary Prediction (MAP) strengthens supervisory signals. Experiments on Amazon and industrial-scale datasets demonstrate that STAMP achieves 1.23–1.38× faster training and reduces GPU memory usage by 17.2%–54.7%, all while maintaining or improving recommendation accuracy.
📝 Abstract
Generative Recommendation (GR) has recently transitioned from atomic item-indexing to Semantic ID (SID)-based frameworks to capture intrinsic item relationships and enhance generalization. However, the adoption of high-granularity SIDs leads to two critical challenges: prohibitive training overhead due to sequence expansion and unstable performance reliability characterized by non-monotonic accuracy fluctuations. We identify that these disparate issues are fundamentally rooted in the Semantic Dilution Effect, where redundant tokens waste massive computation and dilute the already sparse learning signals in recommendation. To counteract this, we propose STAMP (Semantic Trimming and Auxiliary Multi-step Prediction), a framework utilizing a dual-end optimization strategy. We argue that effective SID learning requires simultaneously addressing low input information density and sparse output supervision. On the input side, Semantic Adaptive Pruning (SAP) dynamically filters redundancy during the forward pass, converting noise-laden sequences into compact, information-rich representations. On the output side, Multi-step Auxiliary Prediction (MAP) employs a multi-token objective to densify feedback, strengthening long-range dependency capture and ensuring robust learning signals despite compressed inputs. Unifying input purification and signal amplification, STAMP enhances both training efficiency and representation capability. Experiments on public Amazon and large-scale industrial datasets show STAMP achieves 1.23--1.38$\times$ speedup and 17.2\%--54.7\% VRAM reduction while maintaining or improving performance across multiple architectures.