Bringing Pedagogy into Focus: Evaluating Virtual Teaching Assistants' Question-Answering in Asynchronous Learning Environments

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current virtual teaching assistants (VTAs) lack educationally grounded evaluation frameworks for assessing question-answering quality in asynchronous learning environments, hindering rigorous measurement and cross-system comparison of pedagogical effectiveness. To address this, we propose the first learning-science–informed VTA QA evaluation framework, specifically designed for asynchronous forum discussions. It defines multidimensional pedagogical metrics—including cognitive scaffolding, feedback appropriateness, and reflection promotion—grounded in established educational theory. Leveraging expert-annotated data, we train supervised classifiers for automated assessment. Experimental validation confirms model efficacy, identifies key accuracy determinants (e.g., discourse context modeling), and reveals generalization bottlenecks across domains and task types. This work pioneers the systematic integration of educational theory into VTA evaluation, substantially enhancing interpretability, comparability, and pedagogical relevance. It establishes a reproducible, theory-driven assessment paradigm for AI-enabled educational technologies.

Technology Category

Application Category

📝 Abstract
Asynchronous learning environments (ALEs) are widely adopted for formal and informal learning, but timely and personalized support is often limited. In this context, Virtual Teaching Assistants (VTAs) can potentially reduce the workload of instructors, but rigorous and pedagogically sound evaluation is essential. Existing assessments often rely on surface-level metrics and lack sufficient grounding in educational theories, making it difficult to meaningfully compare the pedagogical effectiveness of different VTA systems. To bridge this gap, we propose an evaluation framework rooted in learning sciences and tailored to asynchronous forum discussions, a common VTA deployment context in ALE. We construct classifiers using expert annotations of VTA responses on a diverse set of forum posts. We evaluate the effectiveness of our classifiers, identifying approaches that improve accuracy as well as challenges that hinder generalization. Our work establishes a foundation for theory-driven evaluation of VTA systems, paving the way for more pedagogically effective AI in education.
Problem

Research questions and friction points this paper is trying to address.

Evaluating Virtual Teaching Assistants' pedagogical effectiveness in asynchronous learning
Addressing lack of educational theory grounding in current VTA assessments
Developing theory-driven evaluation framework for AI teaching assistants
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposed evaluation framework grounded in learning sciences
Constructed classifiers using expert annotations of responses
Identified approaches improving accuracy and generalization challenges
🔎 Similar Papers
No similar papers found.
L
Li Siyan
Columbia University
Z
Zhen Xu
Columbia University
V
Vethavikashini Chithrra Raghuram
Columbia University
X
Xuanming Zhang
Columbia University
Renzhe Yu
Renzhe Yu
Assistant Professor, Columbia University
Educational Data ScienceLearning AnalyticsComputational Social ScienceResponsible AI
Z
Zhou Yu
Columbia University