🤖 AI Summary
This study addresses the automatic detection of AI-generated academic papers in English and Arabic. Methodologically, we propose a bilingual discriminative framework that integrates pretrained language models with stylometric features: ELECTRA is adapted for English academic texts, and AraELECTRA for Arabic, with their deep semantic representations jointly modeled alongside fine-grained stylistic features—including lexical frequency distributions, syntactic complexity metrics, and n-gram patterns. Evaluated on authoritative benchmarks, our approach achieves an F1 score of 99.7% on the English subtask (ranking 2nd among 26 participating teams) and 98.4% on the Arabic subtask (1st among 23 teams), substantially outperforming existing methods. This work establishes a scalable, cross-lingual paradigm for detecting AI-generated content in low-resource languages, advancing both multilingual NLP and academic integrity assurance.
📝 Abstract
Recent research has investigated the problem of detecting machine-generated essays for academic purposes. To address this challenge, this research utilizes pre-trained, transformer-based models fine-tuned on Arabic and English academic essays with stylometric features. Custom models based on ELECTRA for English and AraELECTRA for Arabic were trained and evaluated using a benchmark dataset. Proposed models achieved excellent results with an F1-score of 99.7%, ranking 2nd among of 26 teams in the English subtask, and 98.4%, finishing 1st out of 23 teams in the Arabic one.