🤖 AI Summary
This study addresses the methodological bottleneck in large-scale qualitative analysis of K–12 educational dialogues by investigating how teachers authentically integrate generative AI tools into classroom practice. We propose a four-stage, hierarchical coding framework that synergistically combines human-led thematic induction with large language model (LLM) assistance—leveraging models such as Claude 3.5 Haiku—to perform automated theme identification, code assignment, and benchmarked evaluation against a pedagogically grounded, structured codebook. Our approach enables the first efficient, reliable qualitative analysis of over 100,000 teacher–student dialogues, achieving significantly higher accuracy and structural consistency than open-source LLMs. Results reveal that 79.7% of teacher dialogues center on pedagogical improvement, uncovering three core application patterns: human–AI co-design, differentiated instructional support, and teacher professional development. The study establishes a scalable, empirically verifiable paradigm for qualitative research on AI-mediated educational interactions.
📝 Abstract
The integration of large language models (LLMs) into educational tools has the potential to substantially impact how teachers plan instruction, support diverse learners, and engage in professional reflection. Yet little is known about how educators actually use these tools in practice and how their interactions with AI can be meaningfully studied at scale. This paper presents a human-AI collaborative methodology for large-scale qualitative analysis of over 140,000 educator-AI messages drawn from a generative AI platform used by K-12 teachers. Through a four-phase coding pipeline, we combined inductive theme discovery, codebook development, structured annotation, and model benchmarking to examine patterns of educator engagement and evaluate the performance of LLMs in qualitative coding tasks. We developed a hierarchical codebook aligned with established teacher evaluation frameworks, capturing educators' instructional goals, contextual needs, and pedagogical strategies. Our findings demonstrate that LLMs, particularly Claude 3.5 Haiku, can reliably support theme identification, extend human recognition in complex scenarios, and outperform open-weight models in both accuracy and structural reliability. The analysis also reveals substantive patterns in how educators inquire AI to enhance instructional practices (79.7 percent of total conversations), create or adapt content (76.1 percent), support assessment and feedback loop (46.9 percent), attend to student needs for tailored instruction (43.3 percent), and assist other professional responsibilities (34.2 percent), highlighting emerging AI-related competencies that have direct implications for teacher preparation and professional development. This study offers a scalable, transparent model for AI-augmented qualitative research and provides foundational insights into the evolving role of generative AI in educational practice.