Improving Rationality in the Reasoning Process of Language Models through Self-playing Game

📅 2025-06-28

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

While large language models (LLMs) exhibit reasoning capabilities, they lack deep introspection into and rational regulation of their own reasoning processes. Method: We propose the unsupervised Critic-Discernment Game (CDG), a self-play framework wherein an LLM autonomously generates constructive critiques and misleading distractors, then discerns errors, refines reasoning paths, and enhances response consistency—without external annotations or stronger supervisory models. CDG integrates adversarial critique generation, error identification learning, and multi-turn consistency optimization. Contribution/Results: CDG significantly improves mathematical reasoning, long-chain deduction, and stepwise error correction. Experiments demonstrate substantial gains in self-correction rate and accuracy across multiple complex reasoning benchmarks—including GSM8K, MATH, and ProofWriter. Notably, CDG achieves reasoning rationality enhancement without human or stronger-model supervision—the first such approach—establishing a novel paradigm for endogenous rationality in language models.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have demonstrated considerable reasoning abilities in various tasks such as mathematics and coding. However, recent studies indicate that even the best models lack true comprehension of their reasoning processes. In this paper, we explore how self-play can enhance the rationality of models in the reasoning process without supervision from humans or superior models. We design a Critic-Discernment Game(CDG) in which a prover first provides a solution to a given problem and is subsequently challenged by critiques of its solution. These critiques either aim to assist or mislead the prover. The objective of the prover is to maintain the correct answer when faced with misleading comments, while correcting errors in response to constructive feedback. Our experiments on tasks involving mathematical reasoning, stepwise error detection, self-correction, and long-chain reasoning demonstrate that CDG training can significantly improve the ability of well-aligned LLMs to comprehend their reasoning process.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM reasoning rationality without human supervision

Designing self-play game to improve model error correction

Boosting comprehension of reasoning processes in aligned LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-playing game enhances model rationality

Critic-Discernment Game for feedback handling

Unsupervised training improves reasoning comprehension

🔎 Similar Papers

Self-playing Adversarial Language Game Enhances LLM Reasoning