🤖 AI Summary
Current user studies on eXplainable Recommender Systems (XRS) suffer from insufficient user representativeness and inconsistent reporting of demographic data, severely limiting result generalizability and reproducibility. To address this, we systematically reviewed 124 XRS user studies, conducting meta-analysis, user characteristic coding, and cross-study consistency assessment. Our large-scale empirical analysis reveals that over 70% of studies rely predominantly on non-representative participants—especially students—while only 12% fully report essential demographic variables (e.g., age, gender, domain expertise, cultural background). Based on these findings, we propose a novel evaluation framework that balances inclusivity and reproducibility. It comprises a standardized reporting checklist and evidence-informed diversity recruitment guidelines. This framework establishes a methodological benchmark and practical standard for rigorous, equitable, and replicable XRS evaluation.
📝 Abstract
Adding explanations to recommender systems is said to have multiple benefits, such as increasing user trust or system transparency. Previous work from other application areas suggests that specific user characteristics impact the users' perception of the explanation. However, we rarely find this type of evaluation for recommender systems explanations. This paper addresses this gap by surveying 124 papers in which recommender systems explanations were evaluated in user studies. We analyzed their participant descriptions and study results where the impact of user characteristics on the explanation effects was measured. Our findings suggest that the results from the surveyed studies predominantly cover specific users who do not necessarily represent the users of recommender systems in the evaluation domain. This may seriously hamper the generalizability of any insights we may gain from current studies on explanations in recommender systems. We further find inconsistencies in the data reporting, which impacts the reproducibility of the reported results. Hence, we recommend actions to move toward a more inclusive and reproducible evaluation.