Auto-TA: Towards Scalable Automated Thematic Analysis (TA) via Multi-Agent Large Language Models with Reinforcement Learning

📅 2025-06-30

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Thematic analysis of unstructured clinical narrative text—such as patient and caregiver accounts in congenital heart disease (CHD)—faces challenges including labor-intensive manual coding, poor scalability, and difficulty capturing authentic lived experiences systematically. Method: We propose a multi-agent large language model (LLM) framework featuring role-based collaboration and optional human-in-the-loop reinforcement learning from human feedback (RLHF), enabling end-to-end automated extraction of high-quality themes directly from raw narratives—without requiring pre-defined coding schemas. Contribution/Results: The approach significantly improves thematic relevance, consistency, and interpretability. Empirical evaluation demonstrates strong agreement with expert manual coding (Cohen’s κ > 0.85), supporting scalable, patient-centered qualitative research. This work establishes a novel paradigm for automated, deep semantic mining of clinical narrative data.

Technology Category

Application Category

📝 Abstract

Congenital heart disease (CHD) presents complex, lifelong challenges often underrepresented in traditional clinical metrics. While unstructured narratives offer rich insights into patient and caregiver experiences, manual thematic analysis (TA) remains labor-intensive and unscalable. We propose a fully automated large language model (LLM) pipeline that performs end-to-end TA on clinical narratives, which eliminates the need for manual coding or full transcript review. Our system employs a novel multi-agent framework, where specialized LLM agents assume roles to enhance theme quality and alignment with human analysis. To further improve thematic relevance, we optionally integrate reinforcement learning from human feedback (RLHF). This supports scalable, patient-centered analysis of large qualitative datasets and allows LLMs to be fine-tuned for specific clinical contexts.

Problem

Research questions and friction points this paper is trying to address.

Automating thematic analysis of clinical narratives for scalability

Reducing labor-intensive manual coding in CHD patient insights

Enhancing theme quality via multi-agent LLMs and RLHF

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated LLM pipeline for thematic analysis

Multi-agent framework enhances theme quality

Optional RLHF integration improves thematic relevance

🔎 Similar Papers

System for systematic literature review using multiple AI agents: Concept and an empirical evaluation