🤖 AI Summary
Thematic analysis of unstructured clinical narrative text—such as patient and caregiver accounts in congenital heart disease (CHD)—faces challenges including labor-intensive manual coding, poor scalability, and difficulty capturing authentic lived experiences systematically.
Method: We propose a multi-agent large language model (LLM) framework featuring role-based collaboration and optional human-in-the-loop reinforcement learning from human feedback (RLHF), enabling end-to-end automated extraction of high-quality themes directly from raw narratives—without requiring pre-defined coding schemas.
Contribution/Results: The approach significantly improves thematic relevance, consistency, and interpretability. Empirical evaluation demonstrates strong agreement with expert manual coding (Cohen’s κ > 0.85), supporting scalable, patient-centered qualitative research. This work establishes a novel paradigm for automated, deep semantic mining of clinical narrative data.
📝 Abstract
Congenital heart disease (CHD) presents complex, lifelong challenges often underrepresented in traditional clinical metrics. While unstructured narratives offer rich insights into patient and caregiver experiences, manual thematic analysis (TA) remains labor-intensive and unscalable. We propose a fully automated large language model (LLM) pipeline that performs end-to-end TA on clinical narratives, which eliminates the need for manual coding or full transcript review. Our system employs a novel multi-agent framework, where specialized LLM agents assume roles to enhance theme quality and alignment with human analysis. To further improve thematic relevance, we optionally integrate reinforcement learning from human feedback (RLHF). This supports scalable, patient-centered analysis of large qualitative datasets and allows LLMs to be fine-tuned for specific clinical contexts.