🤖 AI Summary
Current medical AI agents are constrained by static planning strategies, limiting their adaptability to complex, multi-step clinical research tasks. To address this, we propose a meta-level self-evolving architecture that integrates procedural knowledge distillation with strategic trajectory analysis to construct a persistent, updatable strategic knowledge base—introducing, for the first time, a self-improving meta-planning mechanism into medical AI agents. Our method unifies large language models, a meta-planning framework, reinforcement learning, and EHRFlowBench—a realistic electronic health record benchmark—to enable online policy evolution and generalizable optimization. Experiments demonstrate that our agent significantly outperforms existing state-of-the-art approaches on EHRFlowBench, validating its effectiveness in multi-step clinical reasoning, cross-task transfer, and long-horizon strategic adaptation. This advances medical AI from a passive tool user to an autonomous task orchestrator.
📝 Abstract
The efficacy of AI agents in healthcare research is hindered by their reliance on static, predefined strategies. This creates a critical limitation: agents can become better tool-users but cannot learn to become better strategic planners, a crucial skill for complex domains like healthcare. We introduce HealthFlow, a self-evolving AI agent that overcomes this limitation through a novel meta-level evolution mechanism. HealthFlow autonomously refines its own high-level problem-solving policies by distilling procedural successes and failures into a durable, strategic knowledge base. To anchor our research and facilitate reproducible evaluation, we introduce EHRFlowBench, a new benchmark featuring complex, realistic health data analysis tasks derived from peer-reviewed clinical research. Our comprehensive experiments demonstrate that HealthFlow's self-evolving approach significantly outperforms state-of-the-art agent frameworks. This work marks a necessary shift from building better tool-users to designing smarter, self-evolving task-managers, paving the way for more autonomous and effective AI for scientific discovery.