One Refiner to Unlock Them All: Inference-Time Reasoning Elicitation via Reinforcement Query Refinement

📅 2026-04-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

178K/year
🤖 AI Summary
This work addresses the challenge that large language models (LLMs) often fail to fully leverage their reasoning capabilities due to a distributional mismatch between ambiguous human queries and the structured reasoning required by machines. To bridge this gap, the authors propose the ReQueR framework, which employs reinforcement learning during inference to train a universal Refiner policy that rewrites original queries into explicitly decomposed logical forms, thereby activating the reasoning potential of frozen LLMs. Innovatively, ReQueR incorporates an adaptive solver-level curriculum mechanism grounded in Vygotsky’s Zone of Proximal Development theory, enabling a single Refiner to generalize across diverse unseen models—surpassing the limitations of model-specific fine-tuning or static prompting. Experiments demonstrate consistent performance gains across multiple architectures and benchmarks, with an average absolute improvement of 2.1% (ranging from 1.7% to 7.2%), confirming the method’s effectiveness and broad applicability.
📝 Abstract
Large Language Models (LLMs) often fail to utilize their latent reasoning capabilities due to a distributional mismatch between ambiguous human inquiries and the structured logic required for machine activation. Existing alignment methods either incur prohibitive $O(N)$ costs by fine-tuning each model individually or rely on static prompts that fail to resolve query-level structural complexity. In this paper, we propose ReQueR (\textbf{Re}inforcement \textbf{Que}ry \textbf{R}efinement), a modular framework that treats reasoning elicitation as an inference-time alignment task. We train a specialized Refiner policy via Reinforcement Learning to rewrite raw queries into explicit logical decompositions, treating frozen LLMs as the environment. Rooted in the classical Zone of Proximal Development from educational psychology, we introduce the Adaptive Solver Hierarchy, a curriculum mechanism that stabilizes training by dynamically aligning environmental difficulty with the Refiner's evolving competence. ReQueR yields consistent absolute gains of 1.7\%--7.2\% across diverse architectures and benchmarks, outperforming strong baselines by 2.1\% on average. Crucially, it provides a promising paradigm for one-to-many inference-time reasoning elicitation, enabling a single Refiner trained on a small set of models to effectively unlock reasoning in diverse unseen models. Code is available at https://github.com/newera-xiao/ReQueR.
Problem

Research questions and friction points this paper is trying to address.

reasoning elicitation
query refinement
distributional mismatch
inference-time alignment
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning
Query Refinement
Inference-Time Alignment
Modular Reasoning
Adaptive Curriculum
🔎 Similar Papers
2024-04-15Annual Meeting of the Association for Computational LinguisticsCitations: 4