🤖 AI Summary
This work proposes MACRO, a novel framework that addresses the limitations of existing medical imaging agents, which rely on fixed toolsets and invocation strategies that degrade under dynamic clinical demands and require manual reconfiguration. MACRO introduces, for the first time, an experience-driven mechanism for autonomous discovery and registration of composite tools. By analyzing and validating execution trajectories, it synthesizes reusable high-level tool primitives and integrates them with a lightweight image feature memory module and a GRPO-like reinforcement learning policy to enable context-aware, reliable tool invocation. This approach transcends the constraints of static tool compositions, supports closed-loop self-improvement, and significantly enhances multi-step tool orchestration accuracy and cross-domain generalization across diverse medical imaging tasks, outperforming current state-of-the-art baselines.
📝 Abstract
Clinical image interpretation is inherently multi-step and tool-centric: clinicians iteratively combine visual evidence with patient context, quantify findings, and refine their decisions through a sequence of specialized procedures. While LLM-based agents promise to orchestrate such heterogeneous medical tools, existing systems treat tool sets and invocation strategies as static after deployment. This design is brittle under real-world domain shifts, across tasks, and evolving diagnostic requirements, where predefined tool chains frequently degrade and demand costly manual re-design. We propose MACRO, a self-evolving, experience-augmented medical agent that shifts from static tool composition to experience-driven tool discovery. From verified execution trajectories, the agent autonomously identifies recurring effective multi-step tool sequences, synthesizes them into reusable composite tools, and registers these as new high-level primitives that continuously expand its behavioral repertoire. A lightweight image-feature memory grounds tool selection in a visual-clinical context, while a GRPO-like training loop reinforces reliable invocation of discovered composites, enabling closed-loop self-improvement with minimal supervision. Extensive experiments across diverse medical imaging datasets and tasks demonstrate that autonomous composite tool discovery consistently improves multi-step orchestration accuracy and cross-domain generalization over strong baselines and recent state-of-the-art agentic methods, bridging the gap between brittle static tool use and adaptive, context-aware clinical AI assistance. Code will be available upon acceptance.