🤖 AI Summary
This study addresses the underexplored challenge of financial fraud detection in low-resource, multilingual settings, particularly focusing on code-mixed Bengali-English financial texts—a domain largely neglected by existing research predominantly based on English data. The work presents the first systematic investigation of this problem, benchmarking traditional machine learning approaches—using TF-IDF features with logistic regression, linear SVM, and ensemble classifiers—against state-of-the-art Transformer-based models. Experimental results demonstrate that the linear SVM achieves superior performance with 91.59% accuracy and 91.30% F1-score, outperforming deep learning models despite their higher recall of 94.19%, which comes at the cost of elevated false positive rates. These findings highlight critical challenges and feature dynamics inherent to code-mixed, low-resource financial text analysis, offering new insights for advancing multilingual financial security systems.
📝 Abstract
Financial fraud detection has emerged as a critical research challenge amid the rapid expansion of digital financial platforms. Although machine learning approaches have demonstrated strong performance in identifying fraudulent activities, most existing research focuses exclusively on English-language data, limiting applicability to multilingual contexts. Bangla (Bengali), despite being spoken by over 250 million people, remains largely unexplored in this domain. In this work, we investigate financial fraud detection in a multilingual Bangla-English setting using a dataset comprising legitimate and fraudulent financial messages. We evaluate classical machine learning models (Logistic Regression, Linear SVM, and Ensemble classifiers) using TF-IDF features alongside transformer-based architectures. Experimental results using 5-fold stratified cross-validation demonstrate that Linear SVM achieves the best performance with 91.59 percent accuracy and 91.30 percent F1 score, outperforming the transformer model (89.49 percent accuracy, 88.88 percent F1) by approximately 2 percentage points. The transformer exhibits higher fraud recall (94.19 percent) but suffers from elevated false positive rates. Exploratory analysis reveals distinctive patterns: scam messages are longer, contain urgency-inducing terms, and frequently include URLs (32 percent) and phone numbers (97 percent), while legitimate messages feature transactional confirmations and specific currency references. Our findings highlight that classical machine learning with well-crafted features remains competitive for multilingual fraud detection, while also underscoring the challenges posed by linguistic diversity, code-mixing, and low-resource language constraints.