From Black Box to Transparency: Enhancing Automated Interpreting Assessment with Explainable AI in College Classrooms

📅 2025-08-14

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing automatic interpreting quality assessment (AIQA) research suffers from three key limitations: inadequate modeling of linguistic quality, scarcity and class imbalance in annotated data, and poor model interpretability. This paper addresses English–Chinese consecutive interpreting in university classroom settings by proposing a transparency-oriented evaluation framework centered on interpretability. The framework integrates constructive linguistic features (e.g., Chinese phrasal diversity, pause patterns), cross-lingual semantic metrics (BLEURT, COMET-Kiwi), and SMOTE-based data augmentation; it further employs SHAP values to enable fine-grained attribution of fidelity, fluency, and linguistic quality. Experiments on a newly curated English–Chinese interpreting dataset demonstrate significant improvements in prediction performance. Crucially, the framework delivers pedagogically meaningful diagnostic feedback, empirically validating the efficacy and practical potential of feature-driven, interpretable modeling for automated scoring and learner support.

Technology Category

Application Category

📝 Abstract

Recent advancements in machine learning have spurred growing interests in automated interpreting quality assessment. Nevertheless, existing research suffers from insufficient examination of language use quality, unsatisfactory modeling effectiveness due to data scarcity and imbalance, and a lack of efforts to explain model predictions. To address these gaps, we propose a multi-dimensional modeling framework that integrates feature engineering, data augmentation, and explainable machine learning. This approach prioritizes explainability over ``black box'' predictions by utilizing only construct-relevant, transparent features and conducting Shapley Value (SHAP) analysis. Our results demonstrate strong predictive performance on a novel English-Chinese consecutive interpreting dataset, identifying BLEURT and CometKiwi scores to be the strongest predictive features for fidelity, pause-related features for fluency, and Chinese-specific phraseological diversity metrics for language use. Overall, by placing particular emphasis on explainability, we present a scalable, reliable, and transparent alternative to traditional human evaluation, facilitating the provision of detailed diagnostic feedback for learners and supporting self-regulated learning advantages not afforded by automated scores in isolation.

Problem

Research questions and friction points this paper is trying to address.

Enhancing automated interpreting assessment transparency with explainable AI

Addressing data scarcity and model explainability in quality evaluation

Providing diagnostic feedback for interpreting learners through transparent features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-dimensional framework integrating feature engineering

Data augmentation and explainable machine learning

SHAP analysis for transparent feature interpretation

🔎 Similar Papers

No similar papers found.

Authors to Follow