🤖 AI Summary
In settings without ground-truth labels, incentivizing honest human feedback remains challenging—especially when agents exhibit nonlinear utility functions, rendering classical truthful mechanisms ineffective.
Method: We introduce *stochastic dominance truthfulness* (SD-truthfulness), a novel incentive-compatibility criterion robust to nonlinear utilities, and formally define and construct the first mechanism satisfying it: the EA mechanism. Designed for binary signals, EA achieves strict SD-truthfulness via score truncation and game-theoretic analysis to enhance sensitivity.
Contribution/Results: We prove EA’s strong incentive compatibility under minimal assumptions. Empirically, it significantly outperforms existing SD-truthful mechanisms in sensitivity while preserving fairness and statistical efficiency. EA thus provides the first solution for unsupervised feedback aggregation that simultaneously guarantees rigorous incentive compatibility and practical performance.
📝 Abstract
Eliciting reliable human feedback is essential for many machine learning tasks, such as learning from noisy labels and aligning AI systems with human preferences. Peer prediction mechanisms incentivize truthful reporting without ground truth verification by scoring agents based on correlations with peers. Traditional mechanisms, which ensure that truth-telling maximizes the expected scores in equilibrium, can elicit honest information while assuming agents' utilities are linear functions of their scores. However, in practice, non-linear payment rules are usually preferred, or agents' utilities are inherently non-linear. We propose stochastically dominant truthfulness (SD-truthfulness) as a stronger guarantee: the score distribution of truth-telling stochastically dominates all other strategies, incentivizing truthful reporting for a wide range of monotone utility functions. Our first observation is that no existing peer prediction mechanism naturally satisfies this criterion without strong assumptions. A simple solution -- rounding scores into binary lotteries -- can enforce SD-truthfulness, but often degrades sensitivity, a key property related to fairness and statistical efficiency. We demonstrate how a more careful application of rounding can better preserve sensitivity. Furthermore, we introduce a new enforced agreement (EA) mechanism that is theoretically guaranteed to be SD-truthful in binary-signal settings under mild assumptions, and empirically achieves the highest sensitivity among all known SD-truthful mechanisms.