LOGOS: LLM-driven End-to-End Grounded Theory Development and Schema Induction for Qualitative Research

πŸ“… 2025-09-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Grounded theory (GT) in qualitative research suffers from poor scalability due to its reliance on expert-intensive manual coding, and existing computational tools fail to achieve true automation. This paper introduces the first end-to-end automated GT workflow, integrating large language model–driven initial coding, semantic clustering, graph-structured modeling, and iterative refinement to generate hierarchical theoretical frameworks autonomously. We propose a novel five-dimensional evaluation metric and a reusable codebook construction paradigm. Evaluated across five heterogeneous corpora, our method significantly outperforms strong baselines; on complex datasets, the generated theories achieve 88.2% alignment with expert-derived coding patterns. The approach substantially enhances analytical efficiency, reproducibility, and cross-domain applicability while preserving methodological rigor.

Technology Category

Application Category

πŸ“ Abstract
Grounded theory offers deep insights from qualitative data, but its reliance on expert-intensive manual coding presents a major scalability bottleneck. Current computational tools stop short of true automation, keeping researchers firmly in the loop. We introduce LOGOS, a novel, end-to-end framework that fully automates the grounded theory workflow, transforming raw text into a structured, hierarchical theory. LOGOS integrates LLM-driven coding, semantic clustering, graph reasoning, and a novel iterative refinement process to build highly reusable codebooks. To ensure fair comparison, we also introduce a principled 5-dimensional metric and a train-test split protocol for standardized, unbiased evaluation. Across five diverse corpora, LOGOS consistently outperforms strong baselines and achieves a remarkable $88.2%$ alignment with an expert-developed schema on a complex dataset. LOGOS demonstrates a powerful new path to democratize and scale qualitative research without sacrificing theoretical nuance.
Problem

Research questions and friction points this paper is trying to address.

Automating grounded theory development from qualitative data
Overcoming scalability bottlenecks in manual coding processes
Generating structured hierarchical theories through computational methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven coding automates grounded theory workflow
Semantic clustering and graph reasoning build hierarchical theories
Iterative refinement process generates reusable codebooks
X
Xinyu Pi
University of California, San Diego, La Jolla, CA, USA
Q
Qisen Yang
University of California, San Diego, La Jolla, CA, USA
Chuong Nguyen
Chuong Nguyen
University of Southern California
RoboticsOptimizationLearningControlGame Theory and Multi-agent System