🤖 AI Summary
This work addresses a critical limitation in existing large medical language models—their neglect of key uncertainties inherent in clinical diagnosis, such as physician–patient interaction, diagnostic test selection, and information noise. The authors formalize real-world clinical diagnosis for the first time as a noisy partially observable Markov decision process (POMDP), introducing a systematic modeling framework that accounts for seven types of patient-related noise and three types of examination-related noise. They design a cost-sensitive composite reward mechanism and employ supervised fine-tuning on synthetic dialogues generated according to the Calgary–Cambridge model, followed by policy optimization via the DAPO algorithm. The resulting MedExAgent achieves diagnostic accuracy comparable to significantly larger models while substantially reducing examination costs and enhancing patient comfort, demonstrating both the efficacy and practicality of the proposed approach.
📝 Abstract
Real-world clinical diagnosis is a complex process in which the doctor is required to obtain information from both interaction with the patient and conducting medical exams. Additionally, the doctor needs to adapt to different patient personas, as well as noisy and incomplete information that can happen at any time during the process. However, existing benchmarks for medical LLMs and methods for automatic diagnosis largely simplify this process by reducing it to single-turn question answering, noise-free conversations, or sequential exam making, etc., ignoring the interactive and uncertain nature of clinical diagnosis. In this paper, we aim to address this gap by formalizing clinical diagnosis as a Partially Observable Markov Decision Process (POMDP) with three action types: questioning the patient, ordering medical exams as tool calls, and issuing a diagnosis. We also introduce a systematic noise model comprising seven patient noise types and three exam noise types. Using our proposed environment, we train an effective diagnosis agent, \textbf{MedExAgent}, through a two-stage pipeline that first performs supervised finetuning on synthetic conversations structured after the Calgary-Cambridge model for clinical interviews, and then applies DAPO to optimize a composite reward capturing diagnostic accuracy, tool call quality, and exam cost including financial cost and patient discomfort. Through extensive experiments and ablation studies, we demonstrate that MedExAgent achieves diagnostic performance comparable to larger models while maintaining cost-efficient examination strategies.