🤖 AI Summary
This paper addresses estimation distortion in score aggregation arising from rater nonparticipation bias (i.e., selective participation). We propose robust aggregation methods that minimize the expected squared loss between the aggregated score and the true mean of all scores—including unobserved ones. Our key innovation is the “balanced extremal estimation” principle, leading to two novel aggregators: the Balanced Extremal Aggregator for settings with known sample size, and the Polarized Mean Aggregator—proven asymptotically optimal when sample size is unknown. The framework integrates worst-case risk optimization, extreme-value modeling, and missingness mechanism modeling. Evaluated on both synthetic and real-world teaching evaluation datasets, our methods significantly outperform simple averaging and spectral methods, effectively mitigating systematic bias induced by participation bias.
📝 Abstract
Rating aggregation plays a crucial role in various fields, such as product recommendations, hotel rankings, and teaching evaluations. However, traditional averaging methods can be affected by participation bias, where some raters do not participate in the rating process, leading to potential distortions. In this paper, we consider a robust rating aggregation task under the participation bias. We assume that raters may not reveal their ratings with a certain probability depending on their individual ratings, resulting in partially observed samples. Our goal is to minimize the expected squared loss between the aggregated ratings and the average of all underlying ratings (possibly unobserved) in the worst-case scenario. We focus on two settings based on whether the sample size (i.e. the number of raters) is known. In the first setting, where the sample size is known, we propose an aggregator, named as the Balanced Extremes Aggregator. It estimates unrevealed ratings with a balanced combination of extreme ratings. When the sample size is unknown, we derive another aggregator, the Polarizing-Averaging Aggregator, which becomes optimal as the sample size grows to infinity. Numerical results demonstrate the superiority of our proposed aggregators in mitigating participation bias, compared to simple averaging and the spectral method. Furthermore, we validate the effectiveness of our aggregators on a real-world dataset.