๐ค AI Summary
Large language models (LLMs) suffer from factual inconsistency and hallucination. Existing trustworthy alignment methods predominantly rely on supervised fine-tuning. This paper proposes PEG, a training-free, unsupervised game-theoretic framework: it establishes an egalitarianๅๅผ between a generator and multiple heterogeneous discriminators, employing determinant-based mutual information as a label-free reward to incentivize factual generation. We theoretically prove that PEG achieves sublinear regret and converges to the true Nash equilibrium in the final round. To our knowledge, this is the first work to apply egalitarian game dynamics for zero-shot trustworthy alignment of LLMs. Empirical evaluation across multiple factuality benchmarks demonstrates significant accuracy improvements, validating that PEG robustly elicits factual outputs without annotated data or parameter updates.
๐ Abstract
Large Language Models (LLMs) have demonstrated strong generative capabilities but remain prone to inconsistencies and hallucinations. We introduce Peer Elicitation Games (PEG), a training-free, game-theoretic framework for aligning LLMs through a peer elicitation mechanism involving a generator and multiple discriminators instantiated from distinct base models. Discriminators interact in a peer evaluation setting, where rewards are computed using a determinant-based mutual information score that provably incentivizes truthful reporting without requiring ground-truth labels. We establish theoretical guarantees showing that each agent, via online learning, achieves sublinear regret in the sense their cumulative performance approaches that of the best fixed truthful strategy in hindsight. Moreover, we prove last-iterate convergence to a truthful Nash equilibrium, ensuring that the actual policies used by agents converge to stable and truthful behavior over time. Empirical evaluations across multiple benchmarks demonstrate significant improvements in factual accuracy. These results position PEG as a practical approach for eliciting truthful behavior from LLMs without supervision or fine-tuning.