GenAI Voice Mode in Programming Education

📅 2025-09-12

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Accessibility barriers impede programming education for blind beginners. Method: This study presents the first empirical investigation of a multimodal generative AI voice interface for Python instruction, implemented via OpenAI’s Realtime API. We deployed a real-time voice tutoring system in authentic classroom settings, collecting 1,210 voice interactions; analysis combined qualitative coding with Partner Modeling questionnaires to examine interaction patterns, audio feedback quality, and learner perceptions. Results: The system achieved 71.4% accuracy across 416 debugging-related feedback instances; learners frequently employed it for code diagnosis, acknowledging its technical competence yet perceiving insufficient anthropomorphism. Critical audio fidelity issues were identified—including code-reading distortion and semantic segmentation errors. This work establishes the first empirically grounded framework characterizing real-time interaction dynamics of GenAI voice tutors in accessible programming education, offering concrete design implications and optimization pathways for inclusive AI-augmented learning tools.

Technology Category

Application Category

📝 Abstract

Real-time voice interfaces using multimodal Generative AI (GenAI) can potentially address the accessibility needs of novice programmers with disabilities (e.g., related to vision). Yet, little is known about how novices interact with GenAI tools and their feedback quality in the form of audio output. This paper analyzes audio dialogues from nine 9th-grade students using a voice-enabled tutor (powered by OpenAI's Realtime API) in an authentic classroom setting while learning Python. We examined the students' voice prompts and AI's responses (1210 messages) by using qualitative coding. We also gathered students' perceptions via the Partner Modeling Questionnaire. The GenAI Voice Tutor primarily offered feedback on mistakes and next steps, but its correctness was limited (71.4% correct out of 416 feedback outputs). Quality issues were observed, particularly when the AI attempted to utter programming code elements. Students used the GenAI voice tutor primarily for debugging. They perceived it as competent, only somewhat human-like, and flexible. The present study is the first to explore the interaction dynamics of real-time voice GenAI tutors and novice programmers, informing future educational tool design and potentially addressing accessibility needs of diverse learners.

Problem

Research questions and friction points this paper is trying to address.

Exploring real-time voice GenAI tutor interactions with novices

Assessing feedback quality and correctness in audio programming assistance

Addressing accessibility needs for students with disabilities in coding education

Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-time voice interface using multimodal GenAI

Analyzing audio dialogues via qualitative coding

Voice tutor for debugging with OpenAI API

🔎 Similar Papers

The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives