🤖 AI Summary
This study addresses the challenges of early pathological gambling (PG) detection—namely, symptom subtlety and severe data scarcity. We propose a two-stage text classification framework integrating Transformer-based pre-trained models (e.g., RoBERTa) with LSTM for hierarchical feature extraction: RoBERTa captures contextual semantic representations, while LSTM models sequential dependencies in textual sequences. To enhance robustness, we introduce domain-specific text preprocessing (including cleaning and stemming) and a hybrid SMOTE-Tomek sampling strategy to mitigate extreme class imbalance. Evaluated on an international PG detection benchmark dataset, our model achieves an F1-score of 0.126—ranking 7th among 49 competing teams—and attains top performance in recall (0.213) and early-symptom identification accuracy. These results demonstrate superior sensitivity to low-frequency, latent risk expressions characteristic of incipient PG, validating the framework’s efficacy for early, fine-grained behavioral risk detection.
📝 Abstract
This paper describes the participation of the SINAI team in the eRisk@CLEF lab. Specifically, one of the proposed tasks has been addressed: Task 2 on the early detection of signs of pathological gambling. The approach presented in Task 2 is based on pre-trained models from Transformers architecture with comprehensive preprocessing data and data balancing techniques. Moreover, we integrate Long-short Term Memory (LSTM) architecture with automodels from Transformers. In this Task, our team has been ranked in seventh position, with an F1 score of 0.126, out of 49 participant submissions and achieves the highest values in recall metrics and metrics related to early detection.