Evaluating Agents using Social Choice Theory

📅 2023-12-05

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

214K/year

🤖 AI Summary

To address the lack of fairness, interpretability, and robustness in agent evaluation across domains, this paper proposes the Voting-as-Evaluation (VasE) paradigm, framing multi-task evaluation as a social choice problem: each task acts as a “voter,” and global rankings are aggregated from ordinal preferences or pairwise comparisons. Theoretically, VasE incorporates the maximal lotteries mechanism—ensuring key axiomatic properties (e.g., Condorcet consistency) while remaining computationally tractable and capable of detecting cyclic preferences. Empirically, VasE is validated across reinforcement learning, large language model, and human evaluation settings. Results demonstrate superior robustness over baselines including Elo and Nash averaging; VasE uncovers latent performance structures invisible to scalar scoring systems; and in a complex seven-player game, it achieves higher accuracy in performance prediction.

📝 Abstract

We argue that many general evaluation problems can be viewed through the lens of voting theory. Each task is interpreted as a separate voter, which requires only ordinal rankings or pairwise comparisons of agents to produce an overall evaluation. By viewing the aggregator as a social welfare function, we are able to leverage centuries of research in social choice theory to derive principled evaluation frameworks with axiomatic foundations. These evaluations are interpretable and flexible, while avoiding many of the problems currently facing cross-task evaluation. We apply this Voting-as-Evaluation (VasE) framework across multiple settings, including reinforcement learning, large language models, and humans. In practice, we observe that VasE can be more robust than popular evaluation frameworks (Elo and Nash averaging), discovers properties in the evaluation data not evident from scores alone, and can predict outcomes better than Elo in a complex seven-player game. We identify one particular approach, maximal lotteries, that satisfies important consistency properties relevant to evaluation, is computationally efficient (polynomial in the size of the evaluation data), and identifies game-theoretic cycles.

Problem

Research questions and friction points this paper is trying to address.

Agent Evaluation

Cross-domain Tasks

Fair Assessment System

Innovation

Methods, ideas, or system contributions that make the work stand out.

VasE Evaluation Method

Social Choice Theory

Maximimum Lottery Technique

🔎 Similar Papers

No similar papers found.