Processes Matter: How ML/GAI Approaches Could Support Open Qualitative Coding of Online Discourse Datasets

📅 2025-04-02

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This study addresses key challenges in open qualitative coding of large-scale online discourse data—namely, difficulty in conceptual discovery, high manual labor costs, and frequent omission of subtle contextual cues. It systematically evaluates five machine learning and generative AI methods for line-by-line text coding, benchmarking their performance against four human coders. The work introduces, for the first time, a “process-embedded” AI collaboration paradigm, wherein AI tools are deeply integrated into researchers’ analytical workflows rather than substituting human judgment. Results show that AI excels at efficiently extracting explicit content codes, whereas humans outperform AI in capturing interactional dynamics and deep contextual meaning. Their synergistic integration significantly enhances coding efficiency and theoretical sensitivity without compromising intercoder reliability or interpretive depth. This research provides both a methodological framework and empirical evidence for human-AI co-analysis in qualitative inquiry.

Technology Category

Application Category

📝 Abstract

Open coding, a key inductive step in qualitative research, discovers and constructs concepts from human datasets. However, capturing extensive and nuanced aspects or"coding moments"can be challenging, especially with large discourse datasets. While some studies explore machine learning (ML)/Generative AI (GAI)'s potential for open coding, few evaluation studies exist. We compare open coding results by five recently published ML/GAI approaches and four human coders, using a dataset of online chat messages around a mobile learning software. Our systematic analysis reveals ML/GAI approaches' strengths and weaknesses, uncovering the complementary potential between humans and AI. Line-by-line AI approaches effectively identify content-based codes, while humans excel in interpreting conversational dynamics. We discussed how embedded analytical processes could shape the results of ML/GAI approaches. Instead of replacing humans in open coding, researchers should integrate AI with and according to their analytical processes, e.g., as parallel co-coders.

Problem

Research questions and friction points this paper is trying to address.

Evaluating ML/GAI for open coding in qualitative research

Comparing human and AI performance in coding online discourse

Exploring complementary human-AI integration in analytical processes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compare ML/GAI and human open coding results

AI identifies content, humans interpret dynamics

Integrate AI as parallel co-coders

🔎 Similar Papers

QuaLLM: An LLM-based Framework to Extract Quantitative Insights from Online Forums