🤖 AI Summary
Clinical LLM deployment faces a fundamental trade-off between privacy preservation and reasoning capability: cloud-based models offer strong performance but risk exposing sensitive patient data, whereas on-device models ensure privacy yet lack clinical reasoning proficiency. This paper proposes a cloud-edge collaborative inference framework that decouples task decomposition—where a large cloud model generates guideline-compliant prompts—and privacy-preserving execution—where a lightweight on-device model performs subtasks within a de-identified environment and fuses results. We introduce the first hybrid paradigm requiring no transmission of raw clinical data, integrating NCCN guideline-based structured modeling, clinical-domain-driven prompt engineering, and multi-stage LLM orchestration. On pancreatic cancer staging, our method achieves 70.21% accuracy on free-text reports—significantly surpassing local baselines (48.94%–56.59%) and specialist physicians (59.57%–65.96%)—and 85.42% on structured reports, marking the first demonstration of outperforming domain experts.
📝 Abstract
Deploying large language models (LLMs) in clinical settings faces critical trade-offs: cloud LLMs, with their extensive parameters and superior performance, pose risks to sensitive clinical data privacy, while local LLMs preserve privacy but often fail at complex clinical interpretation tasks. We propose MedOrchestra, a hybrid framework where a cloud LLM decomposes complex clinical tasks into manageable subtasks and prompt generation, while a local LLM executes these subtasks in a privacy-preserving manner. Without accessing clinical data, the cloud LLM generates and validates subtask prompts using clinical guidelines and synthetic test cases. The local LLM executes subtasks locally and synthesizes outputs generated by the cloud LLM. We evaluate MedOrchestra on pancreatic cancer staging using 100 radiology reports under NCCN guidelines. On free-text reports, MedOrchestra achieves 70.21% accuracy, outperforming local model baselines (without guideline: 48.94%, with guideline: 56.59%) and board-certified clinicians (gastroenterologists: 59.57%, surgeons: 65.96%, radiologists: 55.32%). On structured reports, MedOrchestra reaches 85.42% accuracy, showing clear superiority across all settings.