๐ค AI Summary
Existing pedagogical agents suffer from rigid pre-scripted dialogues, unnatural animations, the uncanny valley effect induced by photorealistic rendering, and prohibitively high development costs. To address these limitations, VTutor introduces the first lightweight, web-deployable, open-source SDK that integrates large language models (LLMs), text-to-speech (TTS), and real-time lip-sync animation to generate stylized anime-style teaching agentsโthereby avoiding photorealistic rendering pitfalls while enabling on-demand generation of personalized multimodal feedback. Implemented using WebGL, Unity, and JavaScript, VTutor significantly outperforms SadTalker in a 50-participant user study across synchronization accuracy, naturalness, emotional expressiveness, and overall preference. By drastically lowering integration barriers, VTutor provides educational platforms with a low-cost, highly scalable, real-time interactive agent solution.
๐ Abstract
Pedagogical Agents (PAs) show significant potential for boosting student engagement and learning outcomes by providing adaptive, on-demand support in educational contexts. However, existing PA solutions are often hampered by pre-scripted dialogue, unnatural animations, uncanny visual realism, and high development costs. To address these gaps, we introduce VTutor, an open-source SDK leveraging lightweight WebGL, Unity, and JavaScript frameworks. VTutor receives text outputs from a large language model (LLM), converts them into audio via text-to-speech, and then renders a real-time, lip-synced pedagogical agent (PA) for immediate, large-scale deployment on web-based learning platforms. By providing on-demand, personalized feedback, VTutor strengthens students' motivation and deepens their engagement with instructional material. Using an anime-like aesthetic, VTutor alleviates the uncanny valley effect, allowing learners to engage with expressive yet comfortably stylized characters. Our evaluation with 50 participants revealed that VTutor significantly outperforms the existing talking-head approaches (e.g., SadTalker) on perceived synchronization accuracy, naturalness, emotional expressiveness, and overall preference. As an open-source project, VTutor welcomes community-driven contributions - from novel character designs to specialized showcases of pedagogical agent applications - that fuel ongoing innovation in AI-enhanced education. By providing an accessible, customizable, and learner-centered PA solution, VTutor aims to elevate human-AI interaction experience in education fields, ultimately broadening the impact of AI in learning contexts. The demo link to VTutor is at https://vtutor-aied25.vercel.app.