Self-Questioning Language Models

📅 2025-08-05

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This work investigates whether large language models (LLMs) can continuously enhance their reasoning capabilities through self-generated questions and answers—without external labeled data. Method: We propose an asymmetric self-play framework comprising a question-poser module that autonomously constructs domain-specific problems (e.g., algebraic word problems) and a solver module that generates solutions; correctness is evaluated via majority voting or self-generated unit tests, and both modules are jointly optimized using reinforcement learning. Contribution/Results: Our approach is the first to enable end-to-end generation of training signals from topic-level prompts alone—eliminating reliance on human annotations or pre-curated datasets. Experiments on three-digit multiplication, algebraic reasoning, and competitive programming tasks demonstrate substantial performance gains, validating the effectiveness and scalability of unsupervised self-improvement for reasoning.

Technology Category

Application Category

📝 Abstract

Can large language models improve without external data -- by generating their own questions and answers? We hypothesize that a pre-trained language model can improve its reasoning skills given only a single prompt specifying the topic (e.g., algebra word problems) and asking the model to generate its own questions. To do this, we propose Self-Questioning Language Models (SQLM): an asymmetric self-play framework where a proposer is given the topic and generates a question for a solver, who tries to answer it. Both the proposer and solver are trained via reinforcement learning. The proposer receives a reward if the problem is not too easy or too difficult, and the solver receives a reward based on majority voting, a proxy for correctness in the absence of ground-truth answers. For coding, the proposer can instead generate unit tests which are used for verification. We study this asymmetric self-play framework on three benchmarks: three-digit multiplication, algebra problems from the OMEGA benchmark, and programming problems from Codeforces. By continually generating more interesting problems and attempting to solve them, language models can improve on downstream benchmarks without access to any curated training datasets.

Problem

Research questions and friction points this paper is trying to address.

Can language models self-improve via self-generated questions

Proposing and solving questions without external data

Enhancing reasoning skills through asymmetric self-play

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Questioning Language Models framework

Asymmetric self-play with reinforcement learning

Generates and verifies questions autonomously

🔎 Similar Papers

No similar papers found.