Sparse Autoencoders for Sequential Recommendation Models: Interpretation and Flexible Control

📅 2025-07-16

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Existing Transformer-based sequential recommendation models suffer from poor interpretability and limited controllability. To address this, this paper introduces sparse autoencoders (SAEs) into sequential recommendation for the first time, performing sparse linear decomposition of Transformer hidden states to extract semantically coherent and highly unambiguous feature directions. Compared to raw hidden representations, the SAE-learned directions exhibit significantly enhanced interpretability and controllability. By enabling direction-level interventions—such as activating or suppressing specific features—the framework supports fine-grained, scenario-adaptive control over recommendation behavior. Extensive experiments on multiple benchmark datasets demonstrate that the proposed method not only improves model transparency but also enables precise behavioral customization. This work establishes a novel paradigm for interpretable and controllable sequential recommendation, bridging the gap between representation learning and human-understandable, actionable model semantics.

Technology Category

Application Category

📝 Abstract

Many current state-of-the-art models for sequential recommendations are based on transformer architectures. Interpretation and explanation of such black box models is an important research question, as a better understanding of their internals can help understand, influence, and control their behavior, which is very important in a variety of real-world applications. Recently sparse autoencoders (SAE) have been shown to be a promising unsupervised approach for extracting interpretable features from language models. These autoencoders learn to reconstruct hidden states of the transformer's internal layers from sparse linear combinations of directions in their activation space. This paper is focused on the application of SAE to the sequential recommendation domain. We show that this approach can be successfully applied to the transformer trained on a sequential recommendation task: learned directions turn out to be more interpretable and monosemantic than the original hidden state dimensions. Moreover, we demonstrate that the features learned by SAE can be used to effectively and flexibly control the model's behavior, providing end-users with a straightforward method to adjust their recommendations to different custom scenarios and contexts.

Problem

Research questions and friction points this paper is trying to address.

Interpret black-box transformer models for sequential recommendations

Apply sparse autoencoders to extract interpretable recommendation features

Enable flexible control of model behavior for customized scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse autoencoders interpret transformer hidden states

Monosemantic directions improve recommendation interpretability

Flexible control via learned features for customization

🔎 Similar Papers

Your Causal Self-Attentive Recommender Hosts a Lonely Neighborhood