🤖 AI Summary
Traditional Transformers rely solely on a single [CLS] token to model global context, often leading to loss of local and hierarchical semantics. To address this, we propose the Inceptive Transformer—a lightweight Transformer variant that pioneers a synergistic modeling mechanism integrating Inception-style multi-branch convolutions with self-attention for effective multi-scale feature extraction. Additionally, we introduce a task-aware dynamic token reweighting strategy that adaptively models token importance across languages and domains. Evaluated on four diverse NLP tasks—English and Bengali sentiment analysis, irony detection, disease named entity recognition, and COVID-19 vaccine-related tweet classification—the Inceptive Transformer achieves consistent performance gains of 1%–14% over strong baselines, while maintaining computational efficiency. The architecture thus bridges the gap between local pattern capture and global contextual modeling without compromising scalability or cross-domain generalizability.
📝 Abstract
Conventional transformer models typically compress the information from all tokens in a sequence into a single exttt{[CLS]} token to represent global context-- an approach that can lead to information loss in tasks requiring localized or hierarchical cues. In this work, we introduce extit{Inceptive Transformer}, a modular and lightweight architecture that enriches transformer-based token representations by integrating a multi-scale feature extraction module inspired by inception networks. Our model is designed to balance local and global dependencies by dynamically weighting tokens based on their relevance to a particular task. Evaluation across a diverse range of tasks including emotion recognition (both English and Bangla), irony detection, disease identification, and anti-COVID vaccine tweets classification shows that our models consistently outperform the baselines by 1% to 14% while maintaining efficiency. These findings highlight the versatility and cross-lingual applicability of our method for enriching transformer-based representations across diverse domains.