🤖 AI Summary
Recommender systems often suffer from popularity bias, leading to overexposure of popular items and neglect of long-tail items, which compromises both fairness and recommendation quality. This work proposes PopSteer, a novel approach that introduces neuron-level interpretability into bias mitigation for the first time. By applying post-hoc intervention with sparse autoencoders (SAEs) on pre-trained models, PopSteer identifies and modulates specific neurons encoding popularity signals, enabling fine-grained and interpretable fairness control. Experiments on three public datasets demonstrate that PopSteer significantly enhances recommendation fairness while maintaining high accuracy, and further allows precise tuning of the trade-off between fairness and accuracy.
📝 Abstract
Popularity bias is a pervasive challenge in recommender systems, where a few popular items dominate attention while the majority of less popular items remain underexposed. This imbalance can reduce recommendation quality and lead to unfair item exposure. Although existing mitigation methods address this issue to some extent, they often lack transparency in how they operate. In this paper, we propose a post-hoc approach, PopSteer, that leverages a Sparse Autoencoder (SAE) to both interpret and mitigate popularity bias in recommendation models. The SAE is trained to replicate a trained model's behavior while enabling neuron-level interpretability. By introducing synthetic users with strong preferences for either popular or unpopular items, we identify neurons encoding popularity signals through their activation patterns. We then steer recommendations by adjusting the activations of the most biased neurons. Experiments on three public datasets with a sequential recommendation model demonstrate that PopSteer significantly enhances fairness with minimal impact on accuracy, while providing interpretable insights and fine-grained control over the fairness-accuracy trade-off.