IntrEx: A Dataset for Modeling Engagement in Educational Conversations

📅 2025-09-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of modeling learner engagement in educational dialogues through language modeling. To this end, we introduce IntrEx—the first large-scale, sequence-level engagement (i.e., “intriguingness”) annotation dataset—constructed from authentic teacher-student chat logs. Inspired by Reinforcement Learning from Human Feedback (RLHF), we propose a pairwise comparative annotation paradigm to enhance labeling consistency and explicitly capture the dynamic evolution of engagement over dialogue sequences. Methodologically, we innovatively integrate human annotations from second-language learners with fine-tuning of compact 7B/8B-parameter language models to develop a lightweight yet effective engagement prediction model. Experimental results demonstrate that our model significantly outperforms general-purpose large language models—including GPT-4o—on engagement prediction tasks. These findings validate both the necessity and effectiveness of domain-specific annotation frameworks and sequence-level modeling for accurate engagement assessment in educational dialogue systems.

Technology Category

Application Category

📝 Abstract
Engagement and motivation are crucial for second-language acquisition, yet maintaining learner interest in educational conversations remains a challenge. While prior research has explored what makes educational texts interesting, still little is known about the linguistic features that drive engagement in conversations. To address this gap, we introduce IntrEx, the first large dataset annotated for interestingness and expected interestingness in teacher-student interactions. Built upon the Teacher-Student Chatroom Corpus (TSCC), IntrEx extends prior work by incorporating sequence-level annotations, allowing for the study of engagement beyond isolated turns to capture how interest evolves over extended dialogues. We employ a rigorous annotation process with over 100 second-language learners, using a comparison-based rating approach inspired by reinforcement learning from human feedback (RLHF) to improve agreement. We investigate whether large language models (LLMs) can predict human interestingness judgments. We find that LLMs (7B/8B parameters) fine-tuned on interestingness ratings outperform larger proprietary models like GPT-4o, demonstrating the potential for specialised datasets to model engagement in educational settings. Finally, we analyze how linguistic and cognitive factors, such as concreteness, comprehensibility (readability), and uptake, influence engagement in educational dialogues.
Problem

Research questions and friction points this paper is trying to address.

Modeling engagement in educational conversations for language learning
Identifying linguistic features driving student interest in dialogues
Predicting human interestingness judgments using large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large dataset with sequence-level interestingness annotations
Comparison-based rating approach inspired by RLHF
Fine-tuned LLMs outperform larger proprietary models
🔎 Similar Papers
No similar papers found.
Xingwei Tan
Xingwei Tan
Research Associate
Natural Language Processing
M
Mahathi Parvatham
Department of Psychology, University of Warwick, UK
C
Chiara Gambi
Department of Psychology, University of Warwick, UK
Gabriele Pergola
Gabriele Pergola
Assistant Professor, University of Warwick
Natural Language ProcessingSentiment AnalysisQuestion AnsweringMachine Learning