Incentivizing Truthful Language Models via Peer Elicitation Games

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Large language models (LLMs) suffer from factual inconsistency and hallucination. Existing trustworthy alignment methods predominantly rely on supervised fine-tuning. This paper proposes PEG, a training-free, unsupervised game-theoretic framework: it establishes an egalitarian博弈 between a generator and multiple heterogeneous discriminators, employing determinant-based mutual information as a label-free reward to incentivize factual generation. We theoretically prove that PEG achieves sublinear regret and converges to the true Nash equilibrium in the final round. To our knowledge, this is the first work to apply egalitarian game dynamics for zero-shot trustworthy alignment of LLMs. Empirical evaluation across multiple factuality benchmarks demonstrates significant accuracy improvements, validating that PEG robustly elicits factual outputs without annotated data or parameter updates.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated strong generative capabilities but remain prone to inconsistencies and hallucinations. We introduce Peer Elicitation Games (PEG), a training-free, game-theoretic framework for aligning LLMs through a peer elicitation mechanism involving a generator and multiple discriminators instantiated from distinct base models. Discriminators interact in a peer evaluation setting, where rewards are computed using a determinant-based mutual information score that provably incentivizes truthful reporting without requiring ground-truth labels. We establish theoretical guarantees showing that each agent, via online learning, achieves sublinear regret in the sense their cumulative performance approaches that of the best fixed truthful strategy in hindsight. Moreover, we prove last-iterate convergence to a truthful Nash equilibrium, ensuring that the actual policies used by agents converge to stable and truthful behavior over time. Empirical evaluations across multiple benchmarks demonstrate significant improvements in factual accuracy. These results position PEG as a practical approach for eliciting truthful behavior from LLMs without supervision or fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Aligning LLMs to reduce inconsistencies and hallucinations

Incentivizing truthful reporting without ground-truth labels

Ensuring convergence to stable and truthful behavior

Innovation

Methods, ideas, or system contributions that make the work stand out.

Peer Elicitation Games align LLMs without training

Determinant-based mutual information score incentivizes truthfulness

Theoretical guarantees ensure convergence to truthful Nash equilibrium

🔎 Similar Papers

Self-playing Adversarial Language Game Enhances LLM Reasoning