🤖 AI Summary
Current medical AI systems face two key bottlenecks: domain-specific models exhibit poor generalizability, while general-purpose large language models lack clinical knowledge and tool integration capabilities. To address this, we propose a scalable clinical decision-making framework featuring a novel modular tool orchestration architecture that enables zero-intrusion integration of heterogeneous clinical tools. The framework unifies multi-agent collaborative scheduling, tool-augmented reasoning, and multimodal (imaging/tabular/text) alignment to generate auditable, traceable, and verifiable reasoning chains. Our core contribution is the generation of clinically interpretable decision pathways with audit-ready diagnostic processes. Evaluated on Alzheimer’s disease diagnosis, the framework achieves 93.26% accuracy—surpassing prior state-of-the-art by 4.1 percentage points—50.35% accuracy in disease progression prediction, 61.2% macro-averaged AUC for chest X-ray analysis, and 54.47% accuracy on visual question answering.
📝 Abstract
Healthcare decision-making represents one of the most challenging domains for Artificial Intelligence (AI), requiring the integration of diverse knowledge sources, complex reasoning, and various external analytical tools. Current AI systems often rely on either task-specific models, which offer limited adaptability, or general language models without grounding with specialized external knowledge and tools. We introduce MedOrch, a novel framework that orchestrates multiple specialized tools and reasoning agents to provide comprehensive medical decision support. MedOrch employs a modular, agent-based architecture that facilitates the flexible integration of domain-specific tools without altering the core system. Furthermore, it ensures transparent and traceable reasoning processes, enabling clinicians to meticulously verify each intermediate step underlying the system's recommendations. We evaluate MedOrch across three distinct medical applications: Alzheimer's disease diagnosis, chest X-ray interpretation, and medical visual question answering, using authentic clinical datasets. The results demonstrate MedOrch's competitive performance across these diverse medical tasks. Notably, in Alzheimer's disease diagnosis, MedOrch achieves an accuracy of 93.26%, surpassing the state-of-the-art baseline by over four percentage points. For predicting Alzheimer's disease progression, it attains a 50.35% accuracy, marking a significant improvement. In chest X-ray analysis, MedOrch exhibits superior performance with a Macro AUC of 61.2% and a Macro F1-score of 25.5%. Moreover, in complex multimodal visual question answering (Image+Table), MedOrch achieves an accuracy of 54.47%. These findings underscore MedOrch's potential to advance healthcare AI by enabling reasoning-driven tool utilization for multimodal medical data processing and supporting intricate cognitive tasks in clinical decision-making.