🤖 AI Summary
This study addresses the suboptimal performance of multilingual sentiment analysis on morphologically complex languages—particularly Arabic—by systematically evaluating BERT, mBERT, and XLM-R across major European languages and Arabic. We propose a linguistic-feature-driven fine-tuning strategy comprising morphologically sensitive input preprocessing and hierarchical optimization tailored to low-resource languages. Experimental results show that XLM-R achieves 88.2% accuracy on Arabic, substantially outperforming BERT (79.5%) and mBERT (82.1%), attributable to its superior cross-lingual representation capacity. Our fine-tuning approach yields an average 3.6-percentage-point gain for low-resource languages. This work provides the first empirical evidence of XLM-R’s advantage in modeling highly inflected languages and establishes a reusable, linguistically grounded optimization paradigm for low-resource sentiment analysis.
📝 Abstract
This study explores transformer-based models such as BERT, mBERT, and XLM-R for multi-lingual sentiment analysis across diverse linguistic structures. Key contributions include the identification of XLM-R superior adaptability in morphologically complex languages, achieving accuracy levels above 88%. The work highlights fine-tuning strategies and emphasizes their significance for improving sentiment classification in underrepresented languages.