🤖 AI Summary
To address the limitations of passive interaction, weak contextual management, and insufficient support for high-concurrency scenarios in AI-powered pre-consultation systems, this paper proposes a centralized-control-based hierarchical architecture comprising eight autonomous agents, enabling proactive task orchestration and cross-model collaboration. The system dynamically schedules 13 fine-grained subtasks across four core clinical phases: triage, current illness history collection, past medical history acquisition, and chief complaint generation—fully supporting on-premises deployment and end-to-end privacy preservation. By integrating heterogeneous large language models—including GPT-OSS 20B, Qwen3-8B, and Phi4-14B—it establishes the first “task-driven” pre-consultation paradigm. Evaluated on 1,372 real-world electronic health records, the system achieves 87.0% primary specialty triage accuracy, 80.5% secondary specialty classification accuracy, 98.2% task completion rate, an average physician rating of 4.42/5.0, and ≤17 dialogue turns per session—demonstrating substantial improvements in operational efficiency and clinical applicability.
📝 Abstract
Global healthcare systems face critical challenges from increasing patient volumes and limited consultation times, with primary care visits averaging under 5 minutes in many countries. While pre-consultation processes encompassing triage and structured history-taking offer potential solutions, they remain limited by passive interaction paradigms and context management challenges in existing AI systems. This study introduces a hierarchical multi-agent framework that transforms passive medical AI systems into proactive inquiry agents through autonomous task orchestration. We developed an eight-agent architecture with centralized control mechanisms that decomposes pre-consultation into four primary tasks: Triage ($T_1$), History of Present Illness collection ($T_2$), Past History collection ($T_3$), and Chief Complaint generation ($T_4$), with $T_1$--$T_3$ further divided into 13 domain-specific subtasks. Evaluated on 1,372 validated electronic health records from a Chinese medical platform across multiple foundation models (GPT-OSS 20B, Qwen3-8B, Phi4-14B), the framework achieved 87.0% accuracy for primary department triage and 80.5% for secondary department classification, with task completion rates reaching 98.2% using agent-driven scheduling versus 93.1% with sequential processing. Clinical quality scores from 18 physicians averaged 4.56 for Chief Complaints, 4.48 for History of Present Illness, and 4.69 for Past History on a 5-point scale, with consultations completed within 12.7 rounds for $T_2$ and 16.9 rounds for $T_3$. The model-agnostic architecture maintained high performance across different foundation models while preserving data privacy through local deployment, demonstrating the potential for autonomous AI systems to enhance pre-consultation efficiency and quality in clinical settings.