RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models struggle to effectively capture and reuse procedural knowledge in complex reasoning, leading to inefficient exploration and excessively long reasoning paths. To address this, we propose RLAD, a two-stage reinforcement learning framework that decouples “abstraction generation” from “answer generation.” By modeling reasoning abstractions in natural language, RLAD shifts the model’s reasoning paradigm from pattern matching toward algorithmic thinking. Key technical components include reinforcement learning–based post-training, long-chain-of-thought sampling, and two-stage policy optimization. Experiments demonstrate that prioritizing computational resources toward abstraction generation—rather than exhaustive answer enumeration—significantly improves both reasoning efficiency and out-of-distribution generalization. Consistent gains are observed on high-difficulty and unseen problems, validating reasoning abstraction as a critical structured exploration guidance mechanism.

Technology Category

Application Category

📝 Abstract
Reasoning requires going beyond pattern matching or memorization of solutions to identify and implement"algorithmic procedures"that can be used to deduce answers to hard problems. Doing so requires realizing the most relevant primitives, intermediate results, or shared procedures, and building upon them. While RL post-training on long chains of thought ultimately aims to uncover this kind of algorithmic behavior, most reasoning traces learned by large models fail to consistently capture or reuse procedures, instead drifting into verbose and degenerate exploration. To address more effective reasoning, we introduce reasoning abstractions: concise natural language descriptions of procedural and factual knowledge that guide the model toward learning successful reasoning. We train models to be capable of proposing multiple abstractions given a problem, followed by RL that incentivizes building a solution while using the information provided by these abstractions. This results in a two-player RL training paradigm, abbreviated as RLAD, that jointly trains an abstraction generator and a solution generator. This setup effectively enables structured exploration, decouples learning signals of abstraction proposal and solution generation, and improves generalization to harder problems. We also show that allocating more test-time compute to generating abstractions is more beneficial for performance than generating more solutions at large test budgets, illustrating the role of abstractions in guiding meaningful exploration.
Problem

Research questions and friction points this paper is trying to address.

Training models to discover reasoning abstractions for problem-solving
Enabling structured exploration through two-player RL training paradigm
Improving generalization to harder problems via abstraction-guided solution generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training models to generate reasoning abstractions in natural language
Using two-player RL to jointly train abstraction and solution generators
Decoupling learning signals for abstraction proposal and solution generation
🔎 Similar Papers
No similar papers found.