Ensembling Large Language Models to Characterize Affective Dynamics in Student-AI Tutor Dialogues

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prior work has not systematically characterized the temporal dynamics of student emotions in large language model (LLM)-mediated tutoring. Method: We propose the first education-oriented LLM ensemble framework for fine-grained emotion perception, integrating Gemini, GPT-4o, and Claude to perform zero-shot multidimensional emotion annotation—covering valence, arousal, learning helpfulness, and free-text labels—followed by ordinal-weighted pooling and cross-model majority consensus to generate robust emotional profiles. Results: Analysis of 16,986 real tutoring dialogues reveals that students exhibit overall mild positivity; confusion and curiosity dominate initial states; negative emotions are transient and readily reversible; and neutral states serve as critical inflection points frequently preceding transitions to positive affect—constituting high-value intervention windows. This work provides the first empirical evidence of structurally regular emotional evolution in AI tutoring, establishing both theoretical foundations and technical pathways for adaptive affective intervention.

Technology Category

Application Category

📝 Abstract
While recent studies have examined the leaning impact of large language model (LLM) in educational contexts, the affective dynamics of LLM-mediated tutoring remain insufficiently understood. This work introduces the first ensemble-LLM framework for large-scale affect sensing in tutoring dialogues, advancing the conversation on responsible pathways for integrating generative AI into education by attending to learners' evolving affective states. To achieve this, we analyzed two semesters' worth of 16,986 conversational turns exchanged between PyTutor, an LLM-powered AI tutor, and 261 undergraduate learners across three U.S. institutions. To investigate learners' emotional experiences, we generate zero-shot affect annotations from three frontier LLMs (Gemini, GPT-4o, Claude), including scalar ratings of valence, arousal, and learning-helpfulness, along with free-text emotion labels. These estimates are fused through rank-weighted intra-model pooling and plurality consensus across models to produce robust emotion profiles. Our analysis shows that during interaction with the AI tutor, students typically report mildly positive affect and moderate arousal. Yet learning is not uniformly smooth: confusion and curiosity are frequent companions to problem solving, and frustration, while less common, still surfaces in ways that can derail progress. Emotional states are short-lived--positive moments last slightly longer than neutral or negative ones, but they are fragile and easily disrupted. Encouragingly, negative emotions often resolve quickly, sometimes rebounding directly into positive states. Neutral moments frequently act as turning points, more often steering students upward than downward, suggesting opportunities for tutors to intervene at precisely these junctures.
Problem

Research questions and friction points this paper is trying to address.

Analyze affective dynamics in student-AI tutoring dialogues
Develop ensemble-LLM framework for large-scale affect sensing
Investigate emotional states during AI-mediated learning interactions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ensemble LLM framework for large-scale affect sensing
Zero-shot affect annotations from three frontier LLMs
Rank-weighted intra-model pooling and plurality consensus
🔎 Similar Papers
No similar papers found.