A Novel, Human-in-the-Loop Computational Grounded Theory Framework for Big Social Data

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scalability–trustworthiness trade-off in large-scale social media qualitative analysis, this paper proposes a human-in-the-loop Computational Grounded Theory (CGT) framework. CGT is the first approach to deeply embed core grounded theory principles—constant comparison, theoretical sampling, and memo writing—into a closed-loop human–AI workflow, integrating topic modeling, semantic clustering, interactive annotation, active learning, and visual exploration to enable continuous researcher involvement in model decisions. Unlike conventional methods, CGT reconciles methodological rigor with computational efficiency, enabling interpretable, traceable, and reproducible computational qualitative analysis. Evaluated on a Reddit dataset of tutoring professionals, CGT-derived theories comprehensively cover dimensions including labor control, platform trust, and identity negotiation; intercoder reliability reaches 0.89, and theoretical saturation improves by 40% over baseline approaches.

Technology Category

Application Category

📝 Abstract
The availability of big data has significantly influenced the possibilities and methodological choices for conducting large-scale behavioural and social science research. In the context of qualitative data analysis, a major challenge is that conventional methods require intensive manual labour and are often impractical to apply to large datasets. One effective way to address this issue is by integrating emerging computational methods to overcome scalability limitations. However, a critical concern for researchers is the trustworthiness of results when Machine Learning (ML) and Natural Language Processing (NLP) tools are used to analyse such data. We argue that confidence in the credibility and robustness of results depends on adopting a 'human-in-the-loop' methodology that is able to provide researchers with control over the analytical process, while retaining the benefits of using ML and NLP. With this in mind, we propose a novel methodological framework for Computational Grounded Theory (CGT) that supports the analysis of large qualitative datasets, while maintaining the rigour of established Grounded Theory (GT) methodologies. To illustrate the framework's value, we present the results of testing it on a dataset collected from Reddit in a study aimed at understanding tutors' experiences in the gig economy.
Problem

Research questions and friction points this paper is trying to address.

Addressing scalability in qualitative big data analysis
Ensuring trustworthiness of ML/NLP results in social research
Integrating human oversight with computational methods effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-in-the-loop computational grounded theory
Integrates ML and NLP for scalability
Ensures credibility via researcher control
🔎 Similar Papers
No similar papers found.
L
Lama Alqazlan
Department of Computer Science, University of Warwick, UK
L
Lama Alqazlan
Department of Computer Science, University of Warwick, UK
Z
Zheng Fang
Department of Computer Science, University of Warwick, UK
M
Michael Castelle
Centre for Interdisciplinary Methodologies, University of Warwick, UK
Rob Procter
Rob Procter
Professor of Social Informatics, University of Warwick; Faculty Fellow, Alan Turing Institute
Social InformaticsData Science & AICSCWEthnographyParticipatory Design
Rob Procter
Rob Procter
Professor of Social Informatics, University of Warwick; Faculty Fellow, Alan Turing Institute
Social InformaticsData Science & AICSCWEthnographyParticipatory Design