SINAI at eRisk@CLEF 2023: Approaching Early Detection of Gambling with Natural Language Processing

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenges of early pathological gambling (PG) detection—namely, symptom subtlety and severe data scarcity. We propose a two-stage text classification framework integrating Transformer-based pre-trained models (e.g., RoBERTa) with LSTM for hierarchical feature extraction: RoBERTa captures contextual semantic representations, while LSTM models sequential dependencies in textual sequences. To enhance robustness, we introduce domain-specific text preprocessing (including cleaning and stemming) and a hybrid SMOTE-Tomek sampling strategy to mitigate extreme class imbalance. Evaluated on an international PG detection benchmark dataset, our model achieves an F1-score of 0.126—ranking 7th among 49 competing teams—and attains top performance in recall (0.213) and early-symptom identification accuracy. These results demonstrate superior sensitivity to low-frequency, latent risk expressions characteristic of incipient PG, validating the framework’s efficacy for early, fine-grained behavioral risk detection.

Technology Category

Application Category

📝 Abstract
This paper describes the participation of the SINAI team in the eRisk@CLEF lab. Specifically, one of the proposed tasks has been addressed: Task 2 on the early detection of signs of pathological gambling. The approach presented in Task 2 is based on pre-trained models from Transformers architecture with comprehensive preprocessing data and data balancing techniques. Moreover, we integrate Long-short Term Memory (LSTM) architecture with automodels from Transformers. In this Task, our team has been ranked in seventh position, with an F1 score of 0.126, out of 49 participant submissions and achieves the highest values in recall metrics and metrics related to early detection.
Problem

Research questions and friction points this paper is trying to address.

Early detection of pathological gambling signs
Using NLP for gambling behavior identification
Applying transformers and LSTM for risk assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based pre-trained models
LSTM integration with auto-models
Comprehensive preprocessing and data balancing
🔎 Similar Papers
No similar papers found.
A
Alba Maria Marmol-Romero
Computer Science Department, SINAI, CEATIC, Universidad de Jaén, 23071, Spain
Flor Miriam Plaza-del-Arco
Flor Miriam Plaza-del-Arco
Assistant Professor, Leiden University
Natural Language ProcessingComputational Social ScienceOnline harmsAffective ComputingEthics
A
Arturo Montejo-Raez
Computer Science Department, SINAI, CEATIC, Universidad de Jaén, 23071, Spain