🤖 AI Summary
This work addresses the limitations of existing conversational diagnostic systems, which often rely on internal model parameters or assume patients provide complete information—assumptions that rarely hold in real-world initial consultations characterized by ambiguous and incomplete symptom reports. To tackle this challenge, the authors propose a novel multi-turn dialogue diagnostic framework that explicitly integrates a diagnostic knowledge graph into the reasoning process. By iteratively generating and validating diagnostic hypotheses, the system achieves more accurate diagnoses. It combines contextual understanding, hypothesis-driven questioning, and a high-fidelity patient simulator with fuzzy symptoms built from the MIMIC-IV dataset to enhance clinical realism. Experimental results demonstrate that the proposed approach outperforms strong baselines in both diagnostic accuracy and consultation efficiency, with clinical experts confirming the practical utility of its questioning strategy.
📝 Abstract
Conversational diagnosis requires multi-turn history-taking, where an agent asks clarifying questions to refine differential diagnoses under incomplete information. Existing approaches often rely on the parametric knowledge of a model or assume that patients provide rich and concrete information, which is unrealistic. To address these limitations, we propose a conversational diagnosis system that explores a diagnostic knowledge graph to reason in two steps: (i) generating diagnostic hypotheses from the dialogue context, and (ii) verifying hypotheses through clarifying questions, which are repeated until a final diagnosis is reached. Since evaluating the system requires a realistic patient simulator that responds to the system's questions, we adopt a well-established simulator along with patient profiles from MIMIC-IV. We further adapt it to describe symptoms vaguely to reflect real-world patients during early clinical encounters. Experiments show improved diagnostic accuracy and efficiency over strong baselines, and evaluations by physicians support the realism of our simulator and the clinical utility of the generated questions. Our code will be released upon publication.