Opening the Black Box: Interpretable Remedies for Popularity Bias in Recommender Systems

📅 2025-08-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Popularity bias in recommender systems leads to insufficient exposure of long-tail items, undermining fairness and overall recommendation quality. To address this, we propose a post-hoc method based on sparse autoencoders (SAEs), the first to leverage SAEs for interpretable identification and mitigation of neuron-level popularity bias in deep recommendation models: synthetic user activation patterns are used to localize biased neurons, enabling fine-grained intervention during inference. Our approach requires no modification to the original model architecture or training objective, ensuring high transparency and operational flexibility. Experiments on two public benchmark datasets demonstrate that the method significantly improves recommendation fairness—e.g., reducing ILAD and Gini coefficient by 12.6%–28.3%—while incurring less than 0.5% degradation in accuracy, thereby achieving a controllable and interpretable fairness–accuracy trade-off.

Technology Category

Application Category

📝 Abstract
Popularity bias is a well-known challenge in recommender systems, where a small number of popular items receive disproportionate attention, while the majority of less popular items are largely overlooked. This imbalance often results in reduced recommendation quality and unfair exposure of items. Although existing mitigation techniques address this bias to some extent, they typically lack transparency in how they operate. In this paper, we propose a post-hoc method using a Sparse Autoencoder (SAE) to interpret and mitigate popularity bias in deep recommendation models. The SAE is trained to replicate a pre-trained model's behavior while enabling neuron-level interpretability. By introducing synthetic users with clear preferences for either popular or unpopular items, we identify neurons encoding popularity signals based on their activation patterns. We then adjust the activations of the most biased neurons to steer recommendations toward fairer exposure. Experiments on two public datasets using a sequential recommendation model show that our method significantly improves fairness with minimal impact on accuracy. Moreover, it offers interpretability and fine-grained control over the fairness-accuracy trade-off.
Problem

Research questions and friction points this paper is trying to address.

Mitigating popularity bias in recommender systems
Improving fairness while maintaining recommendation accuracy
Providing interpretable neuron-level bias correction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Post-hoc method using Sparse Autoencoder
Neuron-level interpretability for bias identification
Adjusting biased neuron activations for fairness