Pruning for Performance: Efficient Idiom and Metaphor Classification in Low-Resource Konkani Using mBERT

📅 2025-05-24

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the challenge of idiom and metaphor classification in the low-resource language Konkani. We propose a hybrid architecture integrating mBERT with a BiLSTM layer and introduce, for the first time, a gradient-driven attention-head pruning strategy systematically applied to this task—significantly improving inference efficiency without compromising performance. Supervised fine-tuning is conducted on a newly curated, manually annotated Konkani metaphor/idiom dataset. Experiments yield 78% accuracy on metaphor classification and 83% on idiom classification. Our contribution is twofold: (1) we deliver an efficient, deployable solution for figurative language understanding in low-resource settings; and (2) we establish a novel, structured pruning paradigm tailored to low-resource NLP tasks, empirically validating its effectiveness and generalizability under severe computational and data constraints.

Technology Category

Application Category

📝 Abstract

In this paper, we address the persistent challenges that figurative language expressions pose for natural language processing (NLP) systems, particularly in low-resource languages such as Konkani. We present a hybrid model that integrates a pre-trained Multilingual BERT (mBERT) with a bidirectional LSTM and a linear classifier. This architecture is fine-tuned on a newly introduced annotated dataset for metaphor classification, developed as part of this work. To improve the model's efficiency, we implement a gradient-based attention head pruning strategy. For metaphor classification, the pruned model achieves an accuracy of 78%. We also applied our pruning approach to expand on an existing idiom classification task, achieving 83% accuracy. These results demonstrate the effectiveness of attention head pruning for building efficient NLP tools in underrepresented languages.

Problem

Research questions and friction points this paper is trying to address.

Classify figurative language in low-resource Konkani

Improve efficiency via attention head pruning

Enhance NLP tools for underrepresented languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid mBERT and bidirectional LSTM model

Gradient-based attention head pruning strategy

Fine-tuned on annotated Konkani metaphor dataset

🔎 Similar Papers

No similar papers found.