SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High student attrition rates in online education necessitate early, accurate prediction based on multi-source data to enable timely interventions. This paper proposes a multimodal fusion framework that jointly models heterogeneous data: fine-grained sentiment features extracted from student comments via fine-tuned BERT are integrated with key demographic and behavioral features selected by XGBoost. Unlike single-model approaches or conventional fusion methods, our framework captures complex, multifactorial dropout drivers more effectively. Evaluated on a real-world online education dataset, the model achieves 84% accuracy—outperforming baseline methods by two percentage points—and demonstrates superior precision and F1-score. These results validate the efficacy and practical utility of jointly modeling semantic sentiment information with structured behavioral and demographic features for early dropout risk prediction.

Technology Category

Application Category

📝 Abstract
School dropout is a serious problem in distance learning, where early detection is crucial for effective intervention and student perseverance. Predicting student dropout using available educational data is a widely researched topic in learning analytics. Our partner's distance learning platform highlights the importance of integrating diverse data sources, including socio-demographic data, behavioral data, and sentiment analysis, to accurately predict dropout risks. In this paper, we introduce a novel model that combines sentiment analysis of student comments using the Bidirectional Encoder Representations from Transformers (BERT) model with socio-demographic and behavioral data analyzed through Extreme Gradient Boosting (XGBoost). We fine-tuned BERT on student comments to capture nuanced sentiments, which were then merged with key features selected using feature importance techniques in XGBoost. Our model was tested on unseen data from the next academic year, achieving an accuracy of 84%, compared to 82% for the baseline model. Additionally, the model demonstrated superior performance in other metrics, such as precision and F1-score. The proposed method could be a vital tool in developing personalized strategies to reduce dropout rates and encourage student perseverance
Problem

Research questions and friction points this paper is trying to address.

Predicting student dropout in distance learning using multimodal data
Integrating sentiment analysis with socio-demographic and behavioral data
Improving accuracy of dropout prediction to support student perseverance
Innovation

Methods, ideas, or system contributions that make the work stand out.

BERT for sentiment analysis of student comments
XGBoost for socio-demographic and behavioral data
Combining BERT and XGBoost for dropout prediction
M
Meriem Zerkouk
Artificial Intelligence Institute, University of Téluq, 5800, rue Saint-Denis, Montreal, Quebec, H2S 3L5, Canada
M
Miloud Mihoubi
Artificial Intelligence Institute, University of Téluq, 5800, rue Saint-Denis, Montreal, Quebec, H2S 3L5, Canada
Belkacem Chikhaoui
Belkacem Chikhaoui
Full professor, TELUQ University, Montreal
Data minigmachine learningQuantum machine learningArtificial intelligence