K-Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function

📅 2025-07-03

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Children’s speech—characterized by high pitch, prolonged phonemes, and scarce annotated data—degrades ASR performance and hinders early language assessment. To address this, we propose Kids-WFST: a framework integrating a Wav2Vec2-based phoneme encoder with a Dysfluent-WFST decoder constructed from phoneme similarity metrics, enabling accurate, interpretable recognition of child-specific pronunciation errors. We further incorporate a large language model to generate objective scoring, visualized pronunciation diagnostics, and personalized intervention recommendations. Evaluated on the MyST and Multitudes datasets, Kids-WFST achieves phoneme error rates of 1.39% and 8.61%, respectively—substantially outperforming baseline systems. Its outputs demonstrate strong agreement with clinical expert assessments (Cohen’s κ > 0.85), confirming clinical-grade reliability and scalability for real-world deployment in pediatric language evaluation.

Technology Category

Application Category

📝 Abstract

Early evaluation of children's language is frustrated by the high pitch, long phones, and sparse data that derail automatic speech recognisers. We introduce K-Function, a unified framework that combines accurate sub-word transcription, objective scoring, and actionable feedback. Its core, Kids-WFST, merges a Wav2Vec2 phoneme encoder with a phoneme-similarity Dysfluent-WFST to capture child-specific errors while remaining fully interpretable. Kids-WFST attains 1.39% phoneme error on MyST and 8.61% on Multitudes--absolute gains of 10.47 and 7.06 points over a greedy-search decoder. These high-fidelity transcripts power an LLM that grades verbal skills, milestones, reading, and comprehension, aligning with human proctors and supplying tongue-and-lip visualizations plus targeted advice. The results show that precise phoneme recognition cements a complete diagnostic-feedback loop, paving the way for scalable, clinician-ready language assessment.

Problem

Research questions and friction points this paper is trying to address.

Evaluating children's language hindered by high pitch and sparse data

Providing accurate sub-word transcription and actionable feedback for kids

Enabling scalable clinician-ready language assessment via precise phoneme recognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines sub-word transcription and feedback

Uses Kids-WFST for phoneme error reduction

LLM grades skills with human-aligned feedback

🔎 Similar Papers

No similar papers found.