Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning

📅 2025-06-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Thematic analysis of unstructured clinical narrative text—such as patient and caregiver accounts in congenital heart disease (CHD)—faces challenges including labor-intensive manual coding, poor scalability, and difficulty capturing authentic lived experiences systematically. Method: We propose a multi-agent large language model (LLM) framework featuring role-based collaboration and optional human-in-the-loop reinforcement learning from human feedback (RLHF), enabling end-to-end automated extraction of high-quality themes directly from raw narratives—without requiring pre-defined coding schemas. Contribution/Results: The approach significantly improves thematic relevance, consistency, and interpretability. Empirical evaluation demonstrates strong agreement with expert manual coding (Cohen’s κ > 0.85), supporting scalable, patient-centered qualitative research. This work establishes a novel paradigm for automated, deep semantic mining of clinical narrative data.

Technology Category

Application Category

📝 Abstract
Congenital heart disease (CHD) presents complex, lifelong challenges often underrepresented in traditional clinical metrics. While unstructured narratives offer rich insights into patient and caregiver experiences, manual thematic analysis (TA) remains labor-intensive and unscalable. We propose a fully automated large language model (LLM) pipeline that performs end-to-end TA on clinical narratives, which eliminates the need for manual coding or full transcript review. Our system employs a novel multi-agent framework, where specialized LLM agents assume roles to enhance theme quality and alignment with human analysis. To further improve thematic relevance, we optionally integrate reinforcement learning from human feedback (RLHF). This supports scalable, patient-centered analysis of large qualitative datasets and allows LLMs to be fine-tuned for specific clinical contexts.
Problem

Research questions and friction points this paper is trying to address.

Automating thematic analysis of clinical narratives for scalability
Reducing labor-intensive manual coding in CHD patient insights
Enhancing theme quality via multi-agent LLMs and RLHF
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated LLM pipeline for thematic analysis
Multi-agent framework enhances theme quality
Optional RLHF integration improves thematic relevance
S
Seungjun Yi
Department of Biomedical Engineering, University of Texas at Austin
J
Joakim Nguyen
School of Information, University of Texas at Austin
Huimin Xu
Huimin Xu
Ph.D. Student, School of Information, University of Texas at Austin
computational social sciencescience of team science
T
Terence Lim
College of Natural Sciences, University of Texas at Austin
Andrew Well
Andrew Well
Research Assistant Professor of Cardiac Surgery
M
Mia Markey
Department of Biomedical Engineering, University of Texas at Austin
Ying Ding
Ying Ding
Bill & Lewis Suit Professor, School of Information, Dell Med, University of Texas at Austin
AI in HealthKnowledge GraphScience of Science