Mind the Tool Failures: Achieving Synergistic Tool Gains for Medical Agents

📅 2026-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the safety risks in medical AI agents arising from instance-level failures of external tools, a challenge inadequately handled by existing approaches. To overcome the limitations of conventional task-level tool selection, the study formulates tool usage as an instance-level selection problem and introduces a novel instance-aware tool collaboration mechanism. Built upon the GRPO reinforcement learning framework, the method integrates a probability-risk-minimization reward, disagreement-aware collaborative learning, and an entropy-guided sampling strategy to correct erroneous consensus among tools at the instance level. Extensive experiments across two tasks and seven medical benchmarks demonstrate that the proposed approach substantially enhances system robustness and effectively narrows the risk gap between single-tool usage and an ideal instance-level selector.
📝 Abstract
Medical AI agents increasingly use external tools for diagnosis, treatment recommendation, and evidence retrieval, yet most existing approaches assume that task-appropriate tools are reliable within their intended scope. This assumption is fragile in real clinical settings, where even relevant tools may fail on challenging instances and lead to unsafe downstream decisions. To address this issue, we study medical tool use under imperfect-tool settings to correct failure instances missed by individual tools. Instance-dependent failure patterns create a gap between the best fixed single tool and an ideal instance-wise selector, which we refer to as the Single-Oracle risk gap. The core challenge is that conventional task-level tool selection cannot realize this gap, as it is inherently bounded by the performance of the best single tool. Motivated by this observation, we therefore account for instance-level heterogeneity and formulate tool use as an instance-level selection problem. Particularly, we propose a GRPO-based reinforcement learning framework with rewards for probabilistic risk minimization and disagreement-aware synergy learning, which promotes instance-level correction of erroneous tool consensus. Furthermore, an entropy-guided sampling strategy is adopted to upweight high-disagreement instances, which provide stronger signals for learning instance-specific tool synergy. These two components complement each other in mitigating instance-level heterogeneity and improving tool synergy. Experiments on two tasks and seven medical benchmarks show that our method consistently achieves robust and stable improvements over a broad range of baselines, highlighting the importance of synergy-aware tool use for reliable medical agentic systems.
Problem

Research questions and friction points this paper is trying to address.

tool failure
medical AI agents
instance-level heterogeneity
tool reliability
clinical decision-making
Innovation

Methods, ideas, or system contributions that make the work stand out.

instance-level tool selection
tool synergy
GRPO reinforcement learning
risk-aware medical AI
disagreement-aware learning
🔎 Similar Papers
Y
Yunhui Gan
1 Fudan University, 2 Shanghai Academy of Artificial Intelligence for Science, 3 Shanghai Innovation Institute
Tan Pan
Tan Pan
Fudan University
Computer VisionAI4ScienceSelf-supervised Learning
K
Kaiyu Guo
4 The University of Queensland, 2 Shanghai Academy of Artificial Intelligence for Science
L
Limei Han
1 Fudan University, 2 Shanghai Academy of Artificial Intelligence for Science
W
Weimiao Yu
5 Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR)
Guangnan Ye
Guangnan Ye
Fudan University
Computer Vision - Machine Learning
Chen Jiang
Chen Jiang
SAIS
AI for Life Science,Multimodal Learning,Multimodal Foundation Model
Y
Yuan Cheng
1 Fudan University, 2 Shanghai Academy of Artificial Intelligence for Science, 3 Shanghai Innovation Institute