🤖 AI Summary
This work addresses the fairness issues in diffusion-based recommendation models (DiffRec/L-DiffRec) stemming from training data bias, presenting the first systematic evaluation of the trade-off between recommendation utility and multidimensional fairness. We construct the first benchmarking framework for fairness assessment in diffusion recommendation, evaluating nine state-of-the-art recommendation models on two fairness-aware datasets. Our framework jointly measures consumer/provider fairness—via six metrics including demographic parity (DP), equal opportunity (EO), consumer preference equality (CPE), and group preference equality (GPE)—alongside conventional accuracy metrics. Experimental results reveal that DiffRec exhibits significant inherent bias; recommendation accuracy strongly negatively correlates with all fairness dimensions; and while L-DiffRec improves utility, it further degrades fairness. This study establishes a reproducible benchmark, provides quantitative evidence for bias–utility trade-offs, and offers concrete directions for debiasing diffusion-based recommender systems.
📝 Abstract
Diffusion-based recommender systems have recently proven to outperform traditional generative recommendation approaches, such as variational autoencoders and generative adversarial networks. Nevertheless, the machine learning literature has raised several concerns regarding the possibility that diffusion models, while learning the distribution of data samples, may inadvertently carry information bias and lead to unfair outcomes. In light of this aspect, and considering the relevance that fairness has held in recommendations over the last few decades, we conduct one of the first fairness investigations in the literature on DiffRec, a pioneer approach in diffusion-based recommendation. First, we propose an experimental setting involving DiffRec (and its variant L-DiffRec) along with nine state-of-the-art recommendation models, two popular recommendation datasets from the fairness-aware literature, and six metrics accounting for accuracy and consumer/provider fairness. Then, we perform a twofold analysis, one assessing models' performance under accuracy and recommendation fairness separately, and the other identifying if and to what extent such metrics can strike a performance trade-off. Experimental results from both studies confirm the initial unfairness warnings but pave the way for how to address them in future research directions.