🤖 AI Summary
This study addresses inefficient information gathering and insufficient report structuring in clinical consultations by proposing a task-oriented large language model (LLM) dialogue system grounded in a directed acyclic graph (DAG). Methodologically, clinical guidelines and medical algorithms are formalized into executable, dynamic dialogue workflows. The system introduces two key innovations: (1) a hierarchical clustering-based cold-start mechanism enabling initialization without prior patient data, and (2) a response-driven expand-and-prune path-planning strategy supporting personalized dialogue trajectory generation and automatic backtracking. It integrates LLM-based reasoning, adaptive branch control, termination logic inference, and automated structured report generation. In a user evaluation involving five physicians, the system significantly reduced patient cognitive load (NASA-TLX = 15.6) and achieved high usability (SUS = 86); clinicians reported even higher usability (SUS = 88.5), accelerated report generation, and substantial reduction in documentation burden.
📝 Abstract
We developed a task-oriented dialogue framework structured as a Directed Acyclic Graph (DAG) of medical questions. The system integrates: (1) a systematic pipeline for transforming medical algorithms and guidelines into a clinical question corpus; (2) a cold-start mechanism based on hierarchical clustering to generate efficient initial questioning without prior patient information; (3) an expand-and-prune mechanism enabling adaptive branching and backtracking based on patient responses; (4) a termination logic to ensure interviews end once sufficient information is gathered; and (5) automated synthesis of doctor-friendly structured reports aligned with clinical workflows. Human-computer interaction principles guided the design of both the patient and physician applications. Preliminary evaluation involved five physicians using standardized instruments: NASA-TLX (cognitive workload), the System Usability Scale (SUS), and the Questionnaire for User Interface Satisfaction (QUIS). The patient application achieved low workload scores (NASA-TLX = 15.6), high usability (SUS = 86), and strong satisfaction (QUIS = 8.1/9), with particularly high ratings for ease of learning and interface design. The physician application yielded moderate workload (NASA-TLX = 26) and excellent usability (SUS = 88.5), with satisfaction scores of 8.3/9. Both applications demonstrated effective integration into clinical workflows, reducing cognitive demand and supporting efficient report generation. Limitations included occasional system latency and a small, non-diverse evaluation sample.