LOGOS: LLM-driven End-to-End Grounded Theory Development and Schema Induction for Qualitative Research

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Grounded theory (GT) in qualitative research suffers from poor scalability due to its reliance on expert-intensive manual coding, and existing computational tools fail to achieve true automation. This paper introduces the first end-to-end automated GT workflow, integrating large language model–driven initial coding, semantic clustering, graph-structured modeling, and iterative refinement to generate hierarchical theoretical frameworks autonomously. We propose a novel five-dimensional evaluation metric and a reusable codebook construction paradigm. Evaluated across five heterogeneous corpora, our method significantly outperforms strong baselines; on complex datasets, the generated theories achieve 88.2% alignment with expert-derived coding patterns. The approach substantially enhances analytical efficiency, reproducibility, and cross-domain applicability while preserving methodological rigor.

Technology Category

Application Category

📝 Abstract

Grounded theory offers deep insights from qualitative data, but its reliance on expert-intensive manual coding presents a major scalability bottleneck. Current computational tools stop short of true automation, keeping researchers firmly in the loop. We introduce LOGOS, a novel, end-to-end framework that fully automates the grounded theory workflow, transforming raw text into a structured, hierarchical theory. LOGOS integrates LLM-driven coding, semantic clustering, graph reasoning, and a novel iterative refinement process to build highly reusable codebooks. To ensure fair comparison, we also introduce a principled 5-dimensional metric and a train-test split protocol for standardized, unbiased evaluation. Across five diverse corpora, LOGOS consistently outperforms strong baselines and achieves a remarkable $88.2%$ alignment with an expert-developed schema on a complex dataset. LOGOS demonstrates a powerful new path to democratize and scale qualitative research without sacrificing theoretical nuance.

Problem

Research questions and friction points this paper is trying to address.

Automating grounded theory development from qualitative data

Overcoming scalability bottlenecks in manual coding processes

Generating structured hierarchical theories through computational methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven coding automates grounded theory workflow

Semantic clustering and graph reasoning build hierarchical theories

Iterative refinement process generates reusable codebooks

🔎 Similar Papers

Exploring the Potential of Human-LLM Synergy in Advancing Qualitative Analysis: A Case Study on Mental-Illness Stigma