Exploring Human-AI Complementarity in CPS Diagnosis Using Unimodal and Multimodal BERT Models

📅 2025-07-19

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Automated detection of Collaborative Problem Solving (CPS) indicators in dialogue remains a critical challenge in educational AI, hindered by limited statistical power of existing BERT and multimodal AudiBERT models and poorly defined human-AI collaboration mechanisms. Method: We propose a multimodal modeling approach integrating speech transcripts with acoustic-prosodic features, embedded within an interpretable human-AI collaborative diagnostic framework. Contribution/Results: Experiments show AudiBERT significantly improves classification performance on the social-cognitive dimension (p < 0.05), especially for sparse classes. Model efficacy is strongly influenced by training data scale—dominating recall—and inter-annotator agreement—dominating precision. This study provides the first empirical evidence of the coupling between human coding quality and AI performance, establishing theoretical foundations and practical pathways for efficient, trustworthy human-AI complementary diagnosis in educational settings.

Technology Category

Application Category

📝 Abstract

Detecting collaborative problem solving (CPS) indicators from dialogue using machine learning techniques is a significant challenge for the field of AI in Education. Recent studies have explored the use of Bidirectional Encoder Representations from Transformers (BERT) models on transcription data to reliably detect meaningful CPS indicators. A notable advancement involved the multimodal BERT variant, AudiBERT, which integrates speech and acoustic-prosodic audio features to enhance CPS diagnosis. Although initial results demonstrated multimodal improvements, the statistical significance of these enhancements remained unclear, and there was insufficient guidance on leveraging human-AI complementarity for CPS diagnosis tasks. This workshop paper extends the previous research by highlighting that the AudiBERT model not only improved the classification of classes that were sparse in the dataset, but it also had statistically significant class-wise improvements over the BERT model for classifications in the social-cognitive dimension. However, similar significant class-wise improvements over the BERT model were not observed for classifications in the affective dimension. A correlation analysis highlighted that larger training data was significantly associated with higher recall performance for both the AudiBERT and BERT models. Additionally, the precision of the BERT model was significantly associated with high inter-rater agreement among human coders. When employing the BERT model to diagnose indicators within these subskills that were well-detected by the AudiBERT model, the performance across all indicators was inconsistent. We conclude the paper by outlining a structured approach towards achieving human-AI complementarity for CPS diagnosis, highlighting the crucial inclusion of model explainability to support human agency and engagement in the reflective coding process.

Problem

Research questions and friction points this paper is trying to address.

Detecting CPS indicators from dialogue using machine learning techniques

Evaluating multimodal BERT models for statistically significant CPS diagnosis improvements

Developing human-AI complementarity strategies for effective CPS diagnosis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal BERT integrates speech and audio features

AudiBERT improves sparse class classification significantly

Human-AI complementarity enhanced via model explainability

🔎 Similar Papers

No similar papers found.

Authors to Follow