RASTP: Representation-Aware Semantic Token Pruning for Generative Recommendation with Semantic Identifiers

📅 2025-11-20

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Generative recommender systems suffer from excessive computational and memory overhead due to long input sequences induced by semantic identifiers (SIDs) for item representation. While prior work optimizes attention mechanisms or KV caching, this paper introduces the first **representation-aware semantic token pruning method**, which dynamically identifies and removes low-informativeness tokens by jointly modeling their **representation magnitude** and **attention centrality**. Our approach integrates semantic saliency analysis, cumulative attention weight estimation, and adaptive pruning, enabling sequence compression without compromising recommendation accuracy. Extensive experiments on three Amazon datasets demonstrate an average 26.7% reduction in training time, while maintaining or improving key metrics such as Recall@10. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Generative recommendation systems typically leverage Semantic Identifiers (SIDs), which represent each item as a sequence of tokens that encode semantic information. However, representing item ID with multiple SIDs significantly increases input sequence length, which is a major determinant of computational complexity and memory consumption. While existing efforts primarily focus on optimizing attention computation and KV cache, we propose RASTP (Representation-Aware Semantic Token Pruning), which directly prunes less informative tokens in the input sequence. Specifically, RASTP evaluates token importance by combining semantic saliency, measured via representation magnitude, and attention centrality, derived from cumulative attention weights. Since RASTP dynamically prunes low-information or irrelevant semantic tokens, experiments on three real-world Amazon datasets show that RASTP reduces training time by 26.7%, while maintaining or slightly improving recommendation performance. The code has been open-sourced at https://github.com/Yuzt-zju/RASTP.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational complexity in generative recommendation systems with semantic identifiers

Pruning less informative tokens to decrease input sequence length

Maintaining recommendation performance while reducing training time significantly

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prunes less informative tokens in input sequences

Combines semantic saliency and attention centrality

Dynamically removes low-information semantic tokens

🔎 Similar Papers

STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM