A Novel, Human-in-the-Loop Computational Grounded Theory Framework for Big Social Data

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

To address the scalability–trustworthiness trade-off in large-scale social media qualitative analysis, this paper proposes a human-in-the-loop Computational Grounded Theory (CGT) framework. CGT is the first approach to deeply embed core grounded theory principles—constant comparison, theoretical sampling, and memo writing—into a closed-loop human–AI workflow, integrating topic modeling, semantic clustering, interactive annotation, active learning, and visual exploration to enable continuous researcher involvement in model decisions. Unlike conventional methods, CGT reconciles methodological rigor with computational efficiency, enabling interpretable, traceable, and reproducible computational qualitative analysis. Evaluated on a Reddit dataset of tutoring professionals, CGT-derived theories comprehensively cover dimensions including labor control, platform trust, and identity negotiation; intercoder reliability reaches 0.89, and theoretical saturation improves by 40% over baseline approaches.

Technology Category

Application Category

📝 Abstract

The availability of big data has significantly influenced the possibilities and methodological choices for conducting large-scale behavioural and social science research. In the context of qualitative data analysis, a major challenge is that conventional methods require intensive manual labour and are often impractical to apply to large datasets. One effective way to address this issue is by integrating emerging computational methods to overcome scalability limitations. However, a critical concern for researchers is the trustworthiness of results when Machine Learning (ML) and Natural Language Processing (NLP) tools are used to analyse such data. We argue that confidence in the credibility and robustness of results depends on adopting a 'human-in-the-loop' methodology that is able to provide researchers with control over the analytical process, while retaining the benefits of using ML and NLP. With this in mind, we propose a novel methodological framework for Computational Grounded Theory (CGT) that supports the analysis of large qualitative datasets, while maintaining the rigour of established Grounded Theory (GT) methodologies. To illustrate the framework's value, we present the results of testing it on a dataset collected from Reddit in a study aimed at understanding tutors' experiences in the gig economy.

Problem

Research questions and friction points this paper is trying to address.

Addressing scalability in qualitative big data analysis

Ensuring trustworthiness of ML/NLP results in social research

Integrating human oversight with computational methods effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-in-the-loop computational grounded theory

Integrates ML and NLP for scalability

Ensures credibility via researcher control

🔎 Similar Papers

No similar papers found.