🤖 AI Summary
High student attrition rates in online education necessitate early, accurate prediction based on multi-source data to enable timely interventions. This paper proposes a multimodal fusion framework that jointly models heterogeneous data: fine-grained sentiment features extracted from student comments via fine-tuned BERT are integrated with key demographic and behavioral features selected by XGBoost. Unlike single-model approaches or conventional fusion methods, our framework captures complex, multifactorial dropout drivers more effectively. Evaluated on a real-world online education dataset, the model achieves 84% accuracy—outperforming baseline methods by two percentage points—and demonstrates superior precision and F1-score. These results validate the efficacy and practical utility of jointly modeling semantic sentiment information with structured behavioral and demographic features for early dropout risk prediction.
📝 Abstract
School dropout is a serious problem in distance learning, where early detection is crucial for effective intervention and student perseverance. Predicting student dropout using available educational data is a widely researched topic in learning analytics. Our partner's distance learning platform highlights the importance of integrating diverse data sources, including socio-demographic data, behavioral data, and sentiment analysis, to accurately predict dropout risks. In this paper, we introduce a novel model that combines sentiment analysis of student comments using the Bidirectional Encoder Representations from Transformers (BERT) model with socio-demographic and behavioral data analyzed through Extreme Gradient Boosting (XGBoost). We fine-tuned BERT on student comments to capture nuanced sentiments, which were then merged with key features selected using feature importance techniques in XGBoost. Our model was tested on unseen data from the next academic year, achieving an accuracy of 84%, compared to 82% for the baseline model. Additionally, the model demonstrated superior performance in other metrics, such as precision and F1-score. The proposed method could be a vital tool in developing personalized strategies to reduce dropout rates and encourage student perseverance