A Large-Scale Real-World Evaluation of LLM-Based Virtual Teaching Assistant

📅 2025-06-20

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This study addresses the lack of empirical evidence regarding the effectiveness and acceptance of large language model (LLM)-driven virtual teaching assistants (VTAs) in authentic classroom settings. We conducted the first large-scale classroom deployment of a VTA in a graduate-level AI programming course with 477 students. Our evaluation integrated three rounds of student surveys, analysis of 3,869 human–VTA interaction sequences, and a controlled comparison with human teaching assistants, augmented by multi-turn dialogue modeling, educational question categorization, and behavioral pattern mining. Key contributions include: (1) empirical identification of increasing student sensitivity to response accuracy over time and observation of engagement decay; (2) validation of VTA feasibility for foundational programming Q&A, along with characterization of high-frequency question types; and (3) open-sourcing of a fully reproducible VTA system, thereby facilitating a paradigm shift in educational AI research—from technical proof-of-concept to evidence-driven, classroom-validated inquiry.

Technology Category

Application Category

📝 Abstract

Virtual Teaching Assistants (VTAs) powered by Large Language Models (LLMs) have the potential to enhance student learning by providing instant feedback and facilitating multi-turn interactions. However, empirical studies on their effectiveness and acceptance in real-world classrooms are limited, leaving their practical impact uncertain. In this study, we develop an LLM-based VTA and deploy it in an introductory AI programming course with 477 graduate students. To assess how student perceptions of the VTA's performance evolve over time, we conduct three rounds of comprehensive surveys at different stages of the course. Additionally, we analyze 3,869 student--VTA interaction pairs to identify common question types and engagement patterns. We then compare these interactions with traditional student--human instructor interactions to evaluate the VTA's role in the learning process. Through a large-scale empirical study and interaction analysis, we assess the feasibility of deploying VTAs in real-world classrooms and identify key challenges for broader adoption. Finally, we release the source code of our VTA system, fostering future advancements in AI-driven education: exttt{https://github.com/sean0042/VTA}.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM-based Virtual Teaching Assistants in real classrooms

Assessing student perceptions and engagement with VTA over time

Comparing VTA interactions with human instructor interactions

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based VTA deployed in real classroom

Surveys and interaction analysis for evaluation

Open-source VTA system for AI education

🔎 Similar Papers

No similar papers found.