🤖 AI Summary
This study addresses key challenges in open qualitative coding of large-scale online discourse data—namely, difficulty in conceptual discovery, high manual labor costs, and frequent omission of subtle contextual cues. It systematically evaluates five machine learning and generative AI methods for line-by-line text coding, benchmarking their performance against four human coders. The work introduces, for the first time, a “process-embedded” AI collaboration paradigm, wherein AI tools are deeply integrated into researchers’ analytical workflows rather than substituting human judgment. Results show that AI excels at efficiently extracting explicit content codes, whereas humans outperform AI in capturing interactional dynamics and deep contextual meaning. Their synergistic integration significantly enhances coding efficiency and theoretical sensitivity without compromising intercoder reliability or interpretive depth. This research provides both a methodological framework and empirical evidence for human-AI co-analysis in qualitative inquiry.
📝 Abstract
Open coding, a key inductive step in qualitative research, discovers and constructs concepts from human datasets. However, capturing extensive and nuanced aspects or"coding moments"can be challenging, especially with large discourse datasets. While some studies explore machine learning (ML)/Generative AI (GAI)'s potential for open coding, few evaluation studies exist. We compare open coding results by five recently published ML/GAI approaches and four human coders, using a dataset of online chat messages around a mobile learning software. Our systematic analysis reveals ML/GAI approaches' strengths and weaknesses, uncovering the complementary potential between humans and AI. Line-by-line AI approaches effectively identify content-based codes, while humans excel in interpreting conversational dynamics. We discussed how embedded analytical processes could shape the results of ML/GAI approaches. Instead of replacing humans in open coding, researchers should integrate AI with and according to their analytical processes, e.g., as parallel co-coders.