An Agentic AI System for Multi-Framework Communication Coding

📅 2025-12-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-quality coding of clinical doctor–patient dialogues faces challenges including high manual annotation costs, poor inter-annotator consistency, and limited cross-domain generalizability. To address these, we propose MOSAIC, a LangGraph-based multi-agent system featuring a novel four-agent collaborative architecture—codebook selection, dynamic codebook updating, label generation, and consistency verification—integrated with retrieval-augmented generation (RAG), dynamic few-shot prompting, and multi-agent coordination. MOSAIC enables automated, interpretable, and adaptive dialogue coding across multiple clinical communication frameworks. Evaluated on 50 real-world clinician–patient transcripts, it achieves an overall F1-score of 0.928; on the rheumatology subset, F1 reaches 0.962, with particularly strong performance in patient behavior identification. This work significantly enhances the scalability, reliability, and cross-domain applicability of clinical communication analysis.

Technology Category

Application Category

📝 Abstract
Clinical communication is central to patient outcomes, yet large-scale human annotation of patient-provider conversation remains labor-intensive, inconsistent, and difficult to scale. Existing approaches based on large language models typically rely on single-task models that lack adaptability, interpretability, and reliability, especially when applied across various communication frameworks and clinical domains. In this study, we developed a Multi-framework Structured Agentic AI system for Clinical Communication (MOSAIC), built on a LangGraph-based architecture that orchestrates four core agents, including a Plan Agent for codebook selection and workflow planning, an Update Agent for maintaining up-to-date retrieval databases, a set of Annotation Agents that applies codebook-guided retrieval-augmented generation (RAG) with dynamic few-shot prompting, and a Verification Agent that provides consistency checks and feedback. To evaluate performance, we compared MOSAIC outputs against gold-standard annotations created by trained human coders. We developed and evaluated MOSAIC using 26 gold standard annotated transcripts for training and 50 transcripts for testing, spanning rheumatology and OB/GYN domains. On the test set, MOSAIC achieved an overall F1 score of 0.928. Performance was highest in the Rheumatology subset (F1 = 0.962) and strongest for Patient Behavior (e.g., patients asking questions, expressing preferences, or showing assertiveness). Ablations revealed that MOSAIC outperforms baseline benchmarking.
Problem

Research questions and friction points this paper is trying to address.

Automates clinical communication coding across multiple frameworks
Addresses labor-intensive and inconsistent human annotation challenges
Enhances adaptability and reliability in diverse clinical domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent LangGraph architecture orchestrates four specialized agents
Codebook-guided RAG with dynamic few-shot prompting for annotation
Verification agent ensures consistency and provides feedback loops
🔎 Similar Papers
No similar papers found.
Bohao Yang
Bohao Yang
University of Manchester
NLPDialogue GenerationDialogue EvaluationTable UnderstandingLLMs
R
Rui Yang
Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
J
Joshua M. Biro
Medstar Health National Center for Human Factors in Healthcare, Washington, DC, USA
Haoyuan Wang
Haoyuan Wang
University of Pennsylvania, Applied Mathematics and Computational Science
Biostatistics
J
Jessica L. Handley
Medstar Health National Center for Human Factors in Healthcare, Washington, DC, USA
B
Brianna Richardson
Cancer Prevention and Control Research Program, Duke Cancer Institute, Durham, NC, USA
S
Sophia Bessias
Department of Biostatistics and Bioinformatics, Duke School of Medicine, Durham, NC, USA
N
Nicoleta Economou-Zavlanos
Department of Biostatistics and Bioinformatics, Duke School of Medicine, Durham, NC, USA
A
Armando D. Bedoya
Department of Biostatistics and Bioinformatics, Duke School of Medicine, Durham, NC, USA; Department of Medicine, Duke School of Medicine, Durham, NC, USA
Monica Agrawal
Monica Agrawal
Assistant Professor, Duke
Michael M. Zavlanos
Michael M. Zavlanos
Duke University
ControlOptimizationMachine LearningRoboticsCyber-Physical Systems
A
Anand Chowdhury
Department of Mechanical Engineering & Materials Science, Duke University, Durham, NC, USA
Raj M. Ratwani
Raj M. Ratwani
Medstar Health National Center for Human Factors in Healthcare, Washington, DC, USA
K
Kai Sun
Department of Mechanical Engineering & Materials Science, Duke University, Durham, NC, USA
K
Kathryn I. Pollak
Cancer Prevention and Control Research Program, Duke Cancer Institute, Durham, NC, USA; Department of Population Health Sciences, Duke School of Medicine, Durham, NC, USA
M
Michael J. Pencina
Department of Biostatistics and Bioinformatics, Duke School of Medicine, Durham, NC, USA
C
Chuan Hong
Department of Biostatistics and Bioinformatics, Duke School of Medicine, Durham, NC, USA