ReThinker: Scientific Reasoning by Rethinking with Guided Reflection and Confidence Control

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of large language models in expert-level scientific reasoning tasks—such as those in the HLE benchmark—where fixed tool pipelines, fragile multi-agent collaboration, and inefficient test-time scaling hinder performance. The authors propose ReThinker, a novel framework featuring a Solver-Critic-Selector architecture that enables confidence-driven dynamic computation allocation. ReThinker supports adaptive tool invocation, multi-dimensional guided reflection, and confidence-weighted selection. Key innovations include a confidence-aware reasoning mechanism, annotation-free reverse data synthesis, and an adaptive trajectory reuse strategy, collectively overcoming the constraints of conventional pipeline designs. Evaluated on HLE, GAIA, and XBench, ReThinker substantially outperforms both state-of-the-art base models and advanced research systems, establishing a new frontier in expert-level scientific reasoning.

Technology Category

Application Category

📝 Abstract
Expert-level scientific reasoning remains challenging for large language models, particularly on benchmarks such as Humanity's Last Exam (HLE), where rigid tool pipelines, brittle multi-agent coordination, and inefficient test-time scaling often limit performance. We introduce ReThinker, a confidence-aware agentic framework that orchestrates retrieval, tool use, and multi-agent reasoning through a stage-wise Solver-Critic-Selector architecture. Rather than following a fixed pipeline, ReThinker dynamically allocates computation based on model confidence, enabling adaptive tool invocation, guided multi-dimensional reflection, and robust confidence-weighted selection. To support scalable training without human annotation, we further propose a reverse data synthesis pipeline and an adaptive trajectory recycling strategy that transform successful reasoning traces into high-quality supervision. Experiments on HLE, GAIA, and XBench demonstrate that ReThinker consistently outperforms state-of-the-art foundation models with tools and existing deep research systems, achieving state-of-the-art results on expert-level reasoning tasks.
Problem

Research questions and friction points this paper is trying to address.

scientific reasoning
large language models
expert-level reasoning
tool use
multi-agent coordination
Innovation

Methods, ideas, or system contributions that make the work stand out.

confidence-aware reasoning
guided reflection
adaptive tool use
trajectory recycling
solver-critic-selector architecture