Generative Large Language Models (gLLMs) in Content Analysis: A Practical Guide for Communication Research

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses seven key challenges—codebook development, prompt engineering, model selection, parameter tuning, iterative refinement, reliability validation, and performance enhancement—in applying generative large language models (gLLMs) to quantitative content analysis in communication research. We propose the first end-to-end, standardized practical framework for gLLM-assisted coding. Methodologically, it integrates natural-language instructions, few-shot validation, dynamic parameter optimization, and multi-round iterative refinement, implemented using models such as ChatGPT. Our contribution is a reproducible, verifiable, and ethically grounded operational protocol. Empirical evaluation demonstrates that the framework achieves inter-coder reliability and construct validity comparable to—or exceeding—human coding, while drastically reducing manual annotation effort. This work advances communication content analysis toward a more efficient, transparent, and standardized paradigm.

Technology Category

Application Category

📝 Abstract
Generative Large Language Models (gLLMs), such as ChatGPT, are increasingly being used in communication research for content analysis. Studies show that gLLMs can outperform both crowd workers and trained coders, such as research assistants, on various coding tasks relevant to communication science, often at a fraction of the time and cost. Additionally, gLLMs can decode implicit meanings and contextual information, be instructed using natural language, deployed with only basic programming skills, and require little to no annotated data beyond a validation dataset - constituting a paradigm shift in automated content analysis. Despite their potential, the integration of gLLMs into the methodological toolkit of communication research remains underdeveloped. In gLLM-assisted quantitative content analysis, researchers must address at least seven critical challenges that impact result quality: (1) codebook development, (2) prompt engineering, (3) model selection, (4) parameter tuning, (5) iterative refinement, (6) validation of the model's reliability, and optionally, (7) performance enhancement. This paper synthesizes emerging research on gLLM-assisted quantitative content analysis and proposes a comprehensive best-practice guide to navigate these challenges. Our goal is to make gLLM-based content analysis more accessible to a broader range of communication researchers and ensure adherence to established disciplinary quality standards of validity, reliability, reproducibility, and research ethics.
Problem

Research questions and friction points this paper is trying to address.

Addressing underdeveloped integration of gLLMs in communication research
Overcoming seven critical challenges for quality content analysis
Making gLLM-based analysis accessible while ensuring validity and reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

gLLMs decode implicit meanings using natural language instructions
gLLMs require minimal annotated data and basic programming skills
gLLMs address seven critical challenges for quality results
🔎 Similar Papers
No similar papers found.
D
Daria Kravets-Meinke
Bavarian Research Institute for Digital Transformation (bidt)
H
Hannah Schmid-Petri
University of Passau
S
Sonja Niemann
Bavarian Research Institute for Digital Transformation (bidt)
Ute Schmid
Ute Schmid
Professor of Cognitive Systems, University of Bamberg
Interpretable Machine LearningArtificial IntelligenceCognitive ScienceInductive ProgrammingAnalogy