One Refiner to Unlock Them All: Inference-Time Reasoning Elicitation via Reinforcement Query Refinement

📅 2026-04-28

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the challenge that large language models (LLMs) often fail to fully leverage their reasoning capabilities due to a distributional mismatch between ambiguous human queries and the structured reasoning required by machines. To bridge this gap, the authors propose the ReQueR framework, which employs reinforcement learning during inference to train a universal Refiner policy that rewrites original queries into explicitly decomposed logical forms, thereby activating the reasoning potential of frozen LLMs. Innovatively, ReQueR incorporates an adaptive solver-level curriculum mechanism grounded in Vygotsky’s Zone of Proximal Development theory, enabling a single Refiner to generalize across diverse unseen models—surpassing the limitations of model-specific fine-tuning or static prompting. Experiments demonstrate consistent performance gains across multiple architectures and benchmarks, with an average absolute improvement of 2.1% (ranging from 1.7% to 7.2%), confirming the method’s effectiveness and broad applicability.

📝 Abstract

Large Language Models (LLMs) often fail to utilize their latent reasoning capabilities due to a distributional mismatch between ambiguous human inquiries and the structured logic required for machine activation. Existing alignment methods either incur prohibitive $O(N)$ costs by fine-tuning each model individually or rely on static prompts that fail to resolve query-level structural complexity. In this paper, we propose ReQueR (\textbf{Re}inforcement \textbf{Que}ry \textbf{R}efinement), a modular framework that treats reasoning elicitation as an inference-time alignment task. We train a specialized Refiner policy via Reinforcement Learning to rewrite raw queries into explicit logical decompositions, treating frozen LLMs as the environment. Rooted in the classical Zone of Proximal Development from educational psychology, we introduce the Adaptive Solver Hierarchy, a curriculum mechanism that stabilizes training by dynamically aligning environmental difficulty with the Refiner's evolving competence. ReQueR yields consistent absolute gains of 1.7\%--7.2\% across diverse architectures and benchmarks, outperforming strong baselines by 2.1\% on average. Crucially, it provides a promising paradigm for one-to-many inference-time reasoning elicitation, enabling a single Refiner trained on a small set of models to effectively unlock reasoning in diverse unseen models. Code is available at https://github.com/newera-xiao/ReQueR.

Problem

Research questions and friction points this paper is trying to address.

reasoning elicitation

query refinement

distributional mismatch

inference-time alignment

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning

Query Refinement

Inference-Time Alignment