🤖 AI Summary
This work addresses the challenge that deep search agents often suffer from inefficient or inaccurate performance due to difficulty in determining the optimal stopping point, leading to either excessive or insufficient search. To tackle this issue, the paper introduces causal intervention into the calibration of decision boundaries for the first time, proposing the Decision-Aligned Search (DAS) framework. DAS generates preference data by contrasting factual and counterfactual search trajectories and leverages preference optimization to jointly align both the search process and its outcomes. Evaluated on public benchmarks, the method significantly mitigates suboptimal search behavior, simultaneously improving answer accuracy and search efficiency.
📝 Abstract
Deep search agents, which autonomously iterate through multi-turn web-based reasoning, represent a promising paradigm for complex information-seeking tasks. However, current agents suffer from critical inefficiency: they conduct excessive searches as they cannot accurately judge when to stop searching and start answering. This stems from outcome-centric training that prioritize final results over the search process itself. We identify the root cause as misaligned decision boundaries, the threshold determining when accumulated information suffices to answer. This causes over-search (redundant searching despite sufficient knowledge) and under-search (premature termination yielding incorrect answers). To address these errors, we propose a comprehensive framework comprising two key components. First, we introduce causal intervention-based diagnosis that identifies boundary errors by comparing factual and counterfactual trajectories at each decision point. Second, we develop Decision Boundary Alignment for Deep Search agents (DAS), which constructs preference datasets from causal feedback and aligns policies via preference optimization. Experiments on public datasets demonstrate that decision boundary errors are pervasive across state-of-the-art agents. Our DAS method effectively calibrates these boundaries, mitigating both over-search and under-search to achieve substantial gains in accuracy and efficiency. Our code and data are publicly available at: https://github.com/Applied-Machine-Learning-Lab/WWW2026_DAS.