Self-Questioning Language Models

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether large language models (LLMs) can continuously enhance their reasoning capabilities through self-generated questions and answers—without external labeled data. Method: We propose an asymmetric self-play framework comprising a question-poser module that autonomously constructs domain-specific problems (e.g., algebraic word problems) and a solver module that generates solutions; correctness is evaluated via majority voting or self-generated unit tests, and both modules are jointly optimized using reinforcement learning. Contribution/Results: Our approach is the first to enable end-to-end generation of training signals from topic-level prompts alone—eliminating reliance on human annotations or pre-curated datasets. Experiments on three-digit multiplication, algebraic reasoning, and competitive programming tasks demonstrate substantial performance gains, validating the effectiveness and scalability of unsupervised self-improvement for reasoning.

Technology Category

Application Category

📝 Abstract
Can large language models improve without external data -- by generating their own questions and answers? We hypothesize that a pre-trained language model can improve its reasoning skills given only a single prompt specifying the topic (e.g., algebra word problems) and asking the model to generate its own questions. To do this, we propose Self-Questioning Language Models (SQLM): an asymmetric self-play framework where a proposer is given the topic and generates a question for a solver, who tries to answer it. Both the proposer and solver are trained via reinforcement learning. The proposer receives a reward if the problem is not too easy or too difficult, and the solver receives a reward based on majority voting, a proxy for correctness in the absence of ground-truth answers. For coding, the proposer can instead generate unit tests which are used for verification. We study this asymmetric self-play framework on three benchmarks: three-digit multiplication, algebra problems from the OMEGA benchmark, and programming problems from Codeforces. By continually generating more interesting problems and attempting to solve them, language models can improve on downstream benchmarks without access to any curated training datasets.
Problem

Research questions and friction points this paper is trying to address.

Can language models self-improve via self-generated questions
Proposing and solving questions without external data
Enhancing reasoning skills through asymmetric self-play
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Questioning Language Models framework
Asymmetric self-play with reinforcement learning
Generates and verifies questions autonomously
🔎 Similar Papers
No similar papers found.
L
Lili Chen
Carnegie Mellon University
Mihir Prabhudesai
Mihir Prabhudesai
PhD Student at CMU Robotics
Katerina Fragkiadaki
Katerina Fragkiadaki
Associate Professor, Carnegie Mellon University
Computer VisionMachine LearningLanguage GroundingRobotics
H
Hao Liu
Carnegie Mellon University
D
Deepak Pathak
Carnegie Mellon University