AITutor-EvalKit: Exploring the Capabilities of AI Tutors

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

144K/year

🤖 AI Summary

This study addresses the lack of systematic, interpretable evaluation of AI teaching assistants’ instructional quality. We propose the first standardized assessment framework that jointly evaluates pedagogical effectiveness and model interpretability. Methodologically, we develop an open-source, language-technology-driven evaluation toolkit integrating NLP-based analysis, model attribution techniques, interactive visualization, and user feedback annotation—enabling multi-scenario evaluation of AI tutors. Our contributions are threefold: (1) the first integration of educational validity metrics with explainable AI (XAI) methods to establish fine-grained, pedagogy-oriented evaluation dimensions; (2) an end-to-end software tool supporting model behavior diagnostics, pedagogical strategy attribution, and data-driven optimization; and (3) significantly enhanced transparency, auditability, and practical adaptability of educational AI systems—already deployed for educators and the ACL community.

Technology Category

Application Category

📝 Abstract

We present AITutor-EvalKit, an application that uses language technology to evaluate the pedagogical quality of AI tutors, provides software for demonstration and evaluation, as well as model inspection and data visualization. This tool is aimed at education stakeholders as well as *ACL community at large, as it supports learning and can also be used to collect user feedback and annotations.

Problem

Research questions and friction points this paper is trying to address.

Evaluates pedagogical quality of AI tutors

Provides software for demonstration and evaluation

Supports learning and collects user feedback

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses language technology to evaluate AI tutor pedagogy

Provides software for demonstration, evaluation, and visualization

Collects user feedback and annotations for AI tutor improvement

🔎 Similar Papers

Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach