RASTP: Representation-Aware Semantic Token Pruning for Generative Recommendation with Semantic Identifiers

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generative recommender systems suffer from excessive computational and memory overhead due to long input sequences induced by semantic identifiers (SIDs) for item representation. While prior work optimizes attention mechanisms or KV caching, this paper introduces the first **representation-aware semantic token pruning method**, which dynamically identifies and removes low-informativeness tokens by jointly modeling their **representation magnitude** and **attention centrality**. Our approach integrates semantic saliency analysis, cumulative attention weight estimation, and adaptive pruning, enabling sequence compression without compromising recommendation accuracy. Extensive experiments on three Amazon datasets demonstrate an average 26.7% reduction in training time, while maintaining or improving key metrics such as Recall@10. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Generative recommendation systems typically leverage Semantic Identifiers (SIDs), which represent each item as a sequence of tokens that encode semantic information. However, representing item ID with multiple SIDs significantly increases input sequence length, which is a major determinant of computational complexity and memory consumption. While existing efforts primarily focus on optimizing attention computation and KV cache, we propose RASTP (Representation-Aware Semantic Token Pruning), which directly prunes less informative tokens in the input sequence. Specifically, RASTP evaluates token importance by combining semantic saliency, measured via representation magnitude, and attention centrality, derived from cumulative attention weights. Since RASTP dynamically prunes low-information or irrelevant semantic tokens, experiments on three real-world Amazon datasets show that RASTP reduces training time by 26.7%, while maintaining or slightly improving recommendation performance. The code has been open-sourced at https://github.com/Yuzt-zju/RASTP.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational complexity in generative recommendation systems with semantic identifiers
Pruning less informative tokens to decrease input sequence length
Maintaining recommendation performance while reducing training time significantly
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prunes less informative tokens in input sequences
Combines semantic saliency and attention centrality
Dynamically removes low-information semantic tokens
🔎 Similar Papers
No similar papers found.
T
Tianyu Zhan
Zhejiang University
Kairui Fu
Kairui Fu
Zhejiang University
Z
Zheqi Lv
Zhejiang University
S
Shengyu Zhang
Zhejiang University