ReThinker: Scientific Reasoning by Rethinking with Guided Reflection and Confidence Control

📅 2026-02-04

📈 Citations: 0

✨ Influential: 0

career value

145K/year

🤖 AI Summary

This work addresses the limitations of large language models in expert-level scientific reasoning tasks—such as those in the HLE benchmark—where fixed tool pipelines, fragile multi-agent collaboration, and inefficient test-time scaling hinder performance. The authors propose ReThinker, a novel framework featuring a Solver-Critic-Selector architecture that enables confidence-driven dynamic computation allocation. ReThinker supports adaptive tool invocation, multi-dimensional guided reflection, and confidence-weighted selection. Key innovations include a confidence-aware reasoning mechanism, annotation-free reverse data synthesis, and an adaptive trajectory reuse strategy, collectively overcoming the constraints of conventional pipeline designs. Evaluated on HLE, GAIA, and XBench, ReThinker substantially outperforms both state-of-the-art base models and advanced research systems, establishing a new frontier in expert-level scientific reasoning.

Technology Category

Application Category

📝 Abstract

Expert-level scientific reasoning remains challenging for large language models, particularly on benchmarks such as Humanity's Last Exam (HLE), where rigid tool pipelines, brittle multi-agent coordination, and inefficient test-time scaling often limit performance. We introduce ReThinker, a confidence-aware agentic framework that orchestrates retrieval, tool use, and multi-agent reasoning through a stage-wise Solver-Critic-Selector architecture. Rather than following a fixed pipeline, ReThinker dynamically allocates computation based on model confidence, enabling adaptive tool invocation, guided multi-dimensional reflection, and robust confidence-weighted selection. To support scalable training without human annotation, we further propose a reverse data synthesis pipeline and an adaptive trajectory recycling strategy that transform successful reasoning traces into high-quality supervision. Experiments on HLE, GAIA, and XBench demonstrate that ReThinker consistently outperforms state-of-the-art foundation models with tools and existing deep research systems, achieving state-of-the-art results on expert-level reasoning tasks.

Problem

Research questions and friction points this paper is trying to address.

scientific reasoning

large language models

expert-level reasoning

tool use

multi-agent coordination

Innovation

Methods, ideas, or system contributions that make the work stand out.

confidence-aware reasoning

guided reflection

adaptive tool use