π€ AI Summary
ESL learners exhibit overreliance on instructor feedback in prosodic training and demonstrate limited autonomy in self-directed practice. To address this, we propose a dubbing-based, interactive visualization system that supports independent prosody perception and pronunciation training through a three-stage workflow: synchronized listening, guided shadowing, and comparative reflection. The system integrates an automated prosodic feature extraction algorithm with a multi-view visual design, and its interaction logic is refined based on pedagogical expert input. A controlled user study demonstrates statistically significant improvement in learnersβ prosodic perception (p < 0.01) and robust gains in rhythmic accuracy of pronunciation. These findings provide empirical validation and a scalable technical framework for autonomous ESL speech learning.
π Abstract
English speech rhythm, the temporal patterns of stressed syllables, is essential for English as a second language (ESL) learners to produce natural-sounding and comprehensible speech. Rhythm training is generally based on imitation of native speech. However, it relies heavily on external instructor feedback, preventing ESL learners from independent practice. To address this gap, we present RhythmTA, an interactive system for ESL learners to practice speech rhythm independently via dubbing, an imitation-based approach. The system automatically extracts rhythm from any English speech and introduces novel visual designs to support three stages of dubbing practice: (1) Synchronized listening with visual aids to enhance perception, (2) Guided repeating by visual cues for self-adjustment, and (3) Comparative reflection from a parallel view for self-monitoring. Our design is informed by a formative study with nine spoken English instructors, which identified current practices and challenges. A user study with twelve ESL learners demonstrates that RhythmTA effectively enhances learners' rhythm perception and shows significant potential for improving rhythm production.