ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Current clinical decision support systems predominantly rely on static, pre-curated evidence and struggle to actively retrieve and integrate evidence from heterogeneous multimodal data—such as electronic health records, medical knowledge bases, and imaging—in real-world settings. This work proposes ClinSeekAgent, a novel framework that pioneers a paradigm shift from passive evidence consumption to active evidence acquisition. The agent dynamically plans retrieval pathways in response to raw clinical queries, iteratively fuses multimodal information to refine hypotheses, and generates evidence-based decisions. Supporting both inference and training modes, the framework further enhances open-source model performance through trajectory distillation. On ClinSeek-Bench, leading models achieve F1 score improvements of up to 3.2 points on text-only tasks and 15.1 points on multimodal tasks; notably, the distilled model ClinSeek-35B-A3B attains an F1 of 34.0 on AgentEHR-Bench, surpassing baseline performance by 11.9 points.

📝 Abstract

Large language models (LLMs) and agentic systems have shown promise for clinical decision support, but existing works largely assume that evidence has already been curated and handed to the model. Real-world clinical workflows instead require agents to actively seek, iteratively plan, and synthesize multimodal evidence from heterogeneous sources. In this paper, we introduce ClinSeekAgent, an automated agentic framework for dynamic multimodal evidence seeking that shifts the paradigm from passive evidence consumption to active evidence acquisition. Given only a clinical query and access to raw data sources, ClinSeekAgent gathers evidence by querying medical knowledge bases, navigating raw EHRs, and invoking medical imaging tools; refines its hypotheses as new information emerges; and integrates the collected evidence into grounded clinical decisions. ClinSeekAgent serves both as an inference-time agent for frontier LLMs and as a training-time pipeline for distilling high-quality agent trajectories into compact open-source models. To validate its inference-time effectiveness, we construct ClinSeek-Bench, which pairs Curated Input reasoning from fixed pre-selected evidence with Automated Evidence-Seeking over raw clinical data. On text-only EHR tasks, ClinSeekAgent improves Claude Opus 4.6 from 60.0 to 63.2 overall F1 and MiniMax M2.5 from 43.1 to 47.3, with positive risk-prediction gains in 7 out of 9 evaluated host models. On multimodal tasks, ClinSeekAgent improves Claude Opus 4.6 from 47.5 to 62.6 (+15.1); all evaluated models improve across the three CXR-related task groups. We further validate ClinSeekAgent as a training pipeline by distilling agentic evidence-seeking trajectories into ClinSeek-35B-A3B, which achieves 34.0 average F1 on existing AgentEHR-Bench, improving over its Qwen3.5-35B-A3B baseline by +11.9 points and approaching Claude Opus 4.6.

Problem

Research questions and friction points this paper is trying to address.

clinical reasoning

evidence seeking

multimodal data

agentic systems

clinical decision support

Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic clinical reasoning

multimodal evidence seeking

automated evidence acquisition