Incentivizing Truthful Language Models via Peer Elicitation Games

๐Ÿ“… 2025-05-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Large language models (LLMs) suffer from factual inconsistency and hallucination. Existing trustworthy alignment methods predominantly rely on supervised fine-tuning. This paper proposes PEG, a training-free, unsupervised game-theoretic framework: it establishes an egalitarianๅšๅผˆ between a generator and multiple heterogeneous discriminators, employing determinant-based mutual information as a label-free reward to incentivize factual generation. We theoretically prove that PEG achieves sublinear regret and converges to the true Nash equilibrium in the final round. To our knowledge, this is the first work to apply egalitarian game dynamics for zero-shot trustworthy alignment of LLMs. Empirical evaluation across multiple factuality benchmarks demonstrates significant accuracy improvements, validating that PEG robustly elicits factual outputs without annotated data or parameter updates.

Technology Category

Application Category

๐Ÿ“ Abstract
Large Language Models (LLMs) have demonstrated strong generative capabilities but remain prone to inconsistencies and hallucinations. We introduce Peer Elicitation Games (PEG), a training-free, game-theoretic framework for aligning LLMs through a peer elicitation mechanism involving a generator and multiple discriminators instantiated from distinct base models. Discriminators interact in a peer evaluation setting, where rewards are computed using a determinant-based mutual information score that provably incentivizes truthful reporting without requiring ground-truth labels. We establish theoretical guarantees showing that each agent, via online learning, achieves sublinear regret in the sense their cumulative performance approaches that of the best fixed truthful strategy in hindsight. Moreover, we prove last-iterate convergence to a truthful Nash equilibrium, ensuring that the actual policies used by agents converge to stable and truthful behavior over time. Empirical evaluations across multiple benchmarks demonstrate significant improvements in factual accuracy. These results position PEG as a practical approach for eliciting truthful behavior from LLMs without supervision or fine-tuning.
Problem

Research questions and friction points this paper is trying to address.

Aligning LLMs to reduce inconsistencies and hallucinations
Incentivizing truthful reporting without ground-truth labels
Ensuring convergence to stable and truthful behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

Peer Elicitation Games align LLMs without training
Determinant-based mutual information score incentivizes truthfulness
Theoretical guarantees ensure convergence to truthful Nash equilibrium
๐Ÿ”Ž Similar Papers
B
Baiting Chen
Department of Statistics and Data Science, UCLA
T
Tong Zhu
Department of Biostatistics, UCLA
Jiale Han
Jiale Han
The Hong Kong University of Science and Technology
Natural Language Processing
Lexin Li
Lexin Li
Professor of Biostatistics, University of California, Berkeley
neuroimaging analysisnetworks analysisrecommendation systemhigh dimensional regressionmachine learning
G
Gang Li
Department of Biostatistics, UCLA
X
Xiaowu Dai
Departments of Statistics and Data Science, and of Biostatistics, UCLA