Causal Structure Discovery for Error Diagnostics of Children's ASR

📅 2025-05-31

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Children’s automatic speech recognition (ASR) performance lags significantly behind adults’ due to intertwined physiological (e.g., vocal tract morphology), cognitive (e.g., immature articulation), and extrinsic factors (e.g., limited vocabulary, environmental noise)—yet conventional analyses treat these in isolation. Method: This work pioneers the integration of causal structure discovery (PC/NOTEARS) and structural causal modeling (SCM) into ASR diagnostics for children, uncovering latent causal pathways such as “age → articulation → recognition errors,” quantifying direct and indirect effects, and conducting counterfactual intervention analysis via fine-tuning Whisper and Wav2Vec 2.0. Contribution/Results: Articulation proficiency is identified as the primary mediator of age-related ASR disparities. Fine-tuning effectively mitigates vocabulary constraints but yields marginal improvement on physiologically grounded acoustic mismatches. The framework demonstrates strong cross-model generalizability, advancing beyond reductionist, isolated-factor attribution paradigms in child ASR research.

Technology Category

Application Category

📝 Abstract

Children's automatic speech recognition (ASR) often underperforms compared to that of adults due to a confluence of interdependent factors: physiological (e.g., smaller vocal tracts), cognitive (e.g., underdeveloped pronunciation), and extrinsic (e.g., vocabulary limitations, background noise). Existing analysis methods examine the impact of these factors in isolation, neglecting interdependencies-such as age affecting ASR accuracy both directly and indirectly via pronunciation skills. In this paper, we introduce a causal structure discovery to unravel these interdependent relationships among physiology, cognition, extrinsic factors, and ASR errors. Then, we employ causal quantification to measure each factor's impact on children's ASR. We extend the analysis to fine-tuned models to identify which factors are mitigated by fine-tuning and which remain largely unaffected. Experiments on Whisper and Wav2Vec2.0 demonstrate the generalizability of our findings across different ASR systems.

Problem

Research questions and friction points this paper is trying to address.

Identify interdependent factors causing children's ASR errors

Quantify each factor's impact on ASR performance

Determine which factors persist after model fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal structure discovery for ASR error diagnostics

Quantify factors' impact on children's ASR performance

Analyze fine-tuned models to identify mitigated factors

🔎 Similar Papers

Evaluation of state-of-the-art ASR Models in Child-Adult Interactions