AITutor-EvalKit: Exploring the Capabilities of AI Tutors

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of systematic, interpretable evaluation of AI teaching assistants’ instructional quality. We propose the first standardized assessment framework that jointly evaluates pedagogical effectiveness and model interpretability. Methodologically, we develop an open-source, language-technology-driven evaluation toolkit integrating NLP-based analysis, model attribution techniques, interactive visualization, and user feedback annotation—enabling multi-scenario evaluation of AI tutors. Our contributions are threefold: (1) the first integration of educational validity metrics with explainable AI (XAI) methods to establish fine-grained, pedagogy-oriented evaluation dimensions; (2) an end-to-end software tool supporting model behavior diagnostics, pedagogical strategy attribution, and data-driven optimization; and (3) significantly enhanced transparency, auditability, and practical adaptability of educational AI systems—already deployed for educators and the ACL community.

Technology Category

Application Category

📝 Abstract
We present AITutor-EvalKit, an application that uses language technology to evaluate the pedagogical quality of AI tutors, provides software for demonstration and evaluation, as well as model inspection and data visualization. This tool is aimed at education stakeholders as well as *ACL community at large, as it supports learning and can also be used to collect user feedback and annotations.
Problem

Research questions and friction points this paper is trying to address.

Evaluates pedagogical quality of AI tutors
Provides software for demonstration and evaluation
Supports learning and collects user feedback
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses language technology to evaluate AI tutor pedagogy
Provides software for demonstration, evaluation, and visualization
Collects user feedback and annotations for AI tutor improvement
🔎 Similar Papers
No similar papers found.