Inceptive Transformers: Enhancing Contextual Representations through Multi-Scale Feature Learning Across Domains and Languages

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

140K/year

🤖 AI Summary

Traditional Transformers rely solely on a single [CLS] token to model global context, often leading to loss of local and hierarchical semantics. To address this, we propose the Inceptive Transformer—a lightweight Transformer variant that pioneers a synergistic modeling mechanism integrating Inception-style multi-branch convolutions with self-attention for effective multi-scale feature extraction. Additionally, we introduce a task-aware dynamic token reweighting strategy that adaptively models token importance across languages and domains. Evaluated on four diverse NLP tasks—English and Bengali sentiment analysis, irony detection, disease named entity recognition, and COVID-19 vaccine-related tweet classification—the Inceptive Transformer achieves consistent performance gains of 1%–14% over strong baselines, while maintaining computational efficiency. The architecture thus bridges the gap between local pattern capture and global contextual modeling without compromising scalability or cross-domain generalizability.

Technology Category

Application Category

📝 Abstract

Conventional transformer models typically compress the information from all tokens in a sequence into a single exttt{[CLS]} token to represent global context-- an approach that can lead to information loss in tasks requiring localized or hierarchical cues. In this work, we introduce extit{Inceptive Transformer}, a modular and lightweight architecture that enriches transformer-based token representations by integrating a multi-scale feature extraction module inspired by inception networks. Our model is designed to balance local and global dependencies by dynamically weighting tokens based on their relevance to a particular task. Evaluation across a diverse range of tasks including emotion recognition (both English and Bangla), irony detection, disease identification, and anti-COVID vaccine tweets classification shows that our models consistently outperform the baselines by 1% to 14% while maintaining efficiency. These findings highlight the versatility and cross-lingual applicability of our method for enriching transformer-based representations across diverse domains.

Problem

Research questions and friction points this paper is trying to address.

Enhancing transformer token representations with multi-scale feature learning

Addressing information loss in global context representation tasks

Improving cross-domain and cross-lingual performance in diverse NLP tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-scale feature extraction module

Dynamic token weighting mechanism

Lightweight modular transformer architecture

🔎 Similar Papers

No similar papers found.