Educators' Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting

📅 2025-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether large language model (LLM)-based tutors can match or surpass human teachers across core pedagogical dimensions. Method: We conducted a text-only, double-blind evaluation involving experienced educators who assessed LLM and human instructors across four key teaching dimensions—engagement, empathy, scaffolding, and conciseness. An interactive mathematics tutoring system was built using state-of-the-art LLMs; evaluations employed expert annotation and rigorous controlled experimental design. Contribution/Results: LLMs significantly outperformed human instructors on all four metrics, with the largest advantage in empathy (80% preference rate). To our knowledge, this is the first study to systematically validate LLMs as scalable, high-fidelity pedagogical assistants using educator-led blind assessment as primary evidence—demonstrating their feasibility and potential for augmenting instruction at scale.

Technology Category

Application Category

📝 Abstract
The rapid development of Large Language Models (LLMs) opens up the possibility of using them as personal tutors. This has led to the development of several intelligent tutoring systems and learning assistants that use LLMs as back-ends with various degrees of engineering. In this study, we seek to compare human tutors with LLM tutors in terms of engagement, empathy, scaffolding, and conciseness. We ask human tutors to annotate and compare the performance of an LLM tutor with that of a human tutor in teaching grade-school math word problems on these qualities. We find that annotators with teaching experience perceive LLMs as showing higher performance than human tutors in all 4 metrics. The biggest advantage is in empathy, where 80% of our annotators prefer the LLM tutor more often than the human tutors. Our study paints a positive picture of LLMs as tutors and indicates that these models can be used to reduce the load on human teachers in the future.
Problem

Research questions and friction points this paper is trying to address.

Compare human and AI tutors in engagement, empathy, scaffolding, conciseness
Evaluate LLM tutors' performance in teaching math word problems
Assess potential of LLMs to reduce human teachers' workload
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using LLMs as personal tutors
Comparing human and AI tutor performance
LLMs show higher empathy than humans
🔎 Similar Papers
No similar papers found.