Self-Steering Language Models

📅 2025-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Language models suffer from inefficient search, high error rates, and poor verifiability during test-time reasoning. To address these challenges, we propose DisCIPL, a novel “self-guided” reasoning framework: a Planner module—powered by LLM-based meta-reasoning—autonomously generates task-specific recursive search programs; lightweight Followers (e.g., Llama-3.2-1B) then execute these programs in parallel, enabling zero-shot, fine-tuning-free large-scale Monte Carlo reasoning. DisCIPL decouples planning from execution, yielding programmable, stepwise-verifiable inference. On constrained generation tasks, DisCIPL achieves performance on par with or exceeding that of GPT-4o and GPT-4o1 at substantially lower computational cost. This work establishes a new paradigm for efficient, reliable, and interpretable test-time reasoning in large language models.

Technology Category

Application Category

📝 Abstract
While test-time reasoning enables language models to tackle complex tasks, searching or planning in natural language can be slow, costly, and error-prone. But even when LMs struggle to emulate the precise reasoning steps needed to solve a problem, they often excel at describing its abstract structure--both how to verify solutions and how to search for them. This paper introduces DisCIPL, a method for"self-steering"LMs where a Planner model generates a task-specific inference program that is executed by a population of Follower models. Our approach equips LMs with the ability to write recursive search procedures that guide LM inference, enabling new forms of verifiable and efficient reasoning. When instantiated with a small Follower (e.g., Llama-3.2-1B), DisCIPL matches (and sometimes outperforms) much larger models, including GPT-4o and o1, on challenging constrained generation tasks. In decoupling planning from execution, our work opens up a design space of highly-parallelized Monte Carlo inference strategies that outperform standard best-of-N sampling, require no finetuning, and can be implemented automatically by existing LMs.
Problem

Research questions and friction points this paper is trying to address.

Slow and error-prone natural language planning in LMs
Difficulty in precise reasoning steps despite abstract understanding
Need for efficient, verifiable LM-guided search procedures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Planner model generates task-specific inference programs
Follower models execute recursive search procedures
Decouples planning from execution for parallelized strategies
🔎 Similar Papers
No similar papers found.