🤖 AI Summary
This study addresses the fine-grained identification of “hope” sentiment in textual data, proposing a Transformer-based multi-model comparative framework for both binary and multi-class hope expression classification—targeting applications in mental health monitoring and social media analysis. We systematically evaluate BERT, GPT-2, and DeBERTa, demonstrating that architectural suitability outweighs parameter count: BERT achieves the best overall performance with 84.49% binary classification accuracy and minimal training cost; GPT-2 exhibits superior recall on ironic hope expressions. Error analysis identifies context dependency, irony, and metaphor as primary challenges to robust hope detection. Our work provides a reproducible, computationally efficient, and interpretable technical pathway for low-resource hope recognition in affective computing, substantiated by empirical validation across diverse model architectures and linguistic phenomena.
📝 Abstract
This paper presents a transformer-based approach for classifying hope expressions in text. We developed and compared three architectures (BERT, GPT-2, and DeBERTa) for both binary classification (Hope vs. Not Hope) and multiclass categorization (five hope-related categories). Our initial BERT implementation achieved 83.65% binary and 74.87% multiclass accuracy. In the extended comparison, BERT demonstrated superior performance (84.49% binary, 72.03% multiclass accuracy) while requiring significantly fewer computational resources (443s vs. 704s training time) than newer architectures. GPT-2 showed lowest overall accuracy (79.34% binary, 71.29% multiclass), while DeBERTa achieved moderate results (80.70% binary, 71.56% multiclass) but at substantially higher computational cost (947s for multiclass training). Error analysis revealed architecture-specific strengths in detecting nuanced hope expressions, with GPT-2 excelling at sarcasm detection (92.46% recall). This study provides a framework for computational analysis of hope, with applications in mental health and social media analysis, while demonstrating that architectural suitability may outweigh model size for specialized emotion detection tasks.