CriticSearch: Fine-Grained Credit Assignment for Search Agents via a Retrospective Critic

📅 2025-11-15

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Existing search engine–based tool-integrated reasoning (TIR) agents rely on reinforcement learning but suffer from sparse rewards, inefficient exploration, and training instability in complex multi-hop question answering. To address these challenges, we propose the Backtracking Critic—a novel critic mechanism that generates dense, stepwise feedback by leveraging complete reasoning trajectories and ground-truth answers, enabling fine-grained credit assignment. The critic employs a frozen, asymmetric large language model to deliver stable, episode-level evaluation signals, jointly optimizing tool invocation and multi-hop retrieval strategies. Evaluated on mainstream multi-hop reasoning benchmarks, our approach significantly outperforms strong baselines: it accelerates convergence by 32%, improves final performance by 11.4%, and reduces training variance by 47%. The method thus achieves superior efficiency, training stability, and generalization across diverse reasoning tasks.

Technology Category

Application Category

📝 Abstract

Tool-Integrated Reasoning (TIR) with search engines enables large language models to iteratively retrieve up-to-date external knowledge, enhancing adaptability and generalization in complex question-answering tasks. However, existing search agent pipelines typically depend on reinforcement learning based optimization, which often suffers from sparse outcome rewards, leading to inefficient exploration and unstable training. We introduce CriticSearch, a fine-grained credit-assignment framework that supplies dense, turn-level feedback via a retrospective critic mechanism. During training, a frozen, asymmetric critique LLM retrospectively evaluates each turn using privileged information from the full trajectory and gold answers, converting these assessments into stable, dense rewards that guide policy improvement. Experimental results across diverse multi-hop reasoning benchmarks demonstrate that CriticSearch consistently outperforms existing baselines, achieving faster convergence, improved training stability, and higher performance.

Problem

Research questions and friction points this paper is trying to address.

Addresses sparse rewards in search agent training

Provides fine-grained credit assignment via retrospective critique

Enhances convergence and stability in multi-hop reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained credit assignment via retrospective critic mechanism

Asymmetric critique LLM provides dense turn-level feedback

Converts trajectory assessments into stable rewards

🔎 Similar Papers

Bridging Social Media and Search Engines: Dredge Words and the Detection of Unreliable Domains