PolyBERT: Fine-Tuned Poly Encoder BERT-Based Model for Word Sense Disambiguation

📅 2025-06-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the performance bottlenecks and high computational overhead in BERT-based word sense disambiguation (WSD)—stemming from imbalanced local/global semantic representation and redundant all-senses training—this paper proposes Poly-Encoder: a multi-head attention encoder that jointly models token-level (local) and sequence-level (global) semantics, augmented with batch-wise contrastive learning (BCL). BCL treats other senses within the same batch as negative examples, enabling sense-aware discriminative training without requiring a predefined sense inventory, thereby significantly reducing training redundancy. Evaluated on standard WSD benchmarks, Poly-Encoder achieves a +2.0% F1 improvement over strong baselines and reduces GPU training time by 37.6% compared to the non-BCL counterpart. To our knowledge, this is the first work to simultaneously achieve joint optimization of local–global semantic integration and computational efficiency in WSD.

Technology Category

Application Category

📝 Abstract
Mainstream Word Sense Disambiguation (WSD) approaches have employed BERT to extract semantics from both context and definitions of senses to determine the most suitable sense of a target word, achieving notable performance. However, there are two limitations in these approaches. First, previous studies failed to balance the representation of token-level (local) and sequence-level (global) semantics during feature extraction, leading to insufficient semantic representation and a performance bottleneck. Second, these approaches incorporated all possible senses of each target word during the training phase, leading to unnecessary computational costs. To overcome these limitations, this paper introduces a poly-encoder BERT-based model with batch contrastive learning for WSD, named PolyBERT. Compared with previous WSD methods, PolyBERT has two improvements: (1) A poly-encoder with a multi-head attention mechanism is utilized to fuse token-level (local) and sequence-level (global) semantics, rather than focusing on just one. This approach enriches semantic representation by balancing local and global semantics. (2) To avoid redundant training inputs, Batch Contrastive Learning (BCL) is introduced. BCL utilizes the correct senses of other target words in the same batch as negative samples for the current target word, which reduces training inputs and computational cost. The experimental results demonstrate that PolyBERT outperforms baseline WSD methods such as Huang's GlossBERT and Blevins's BEM by 2% in F1-score. In addition, PolyBERT with BCL reduces GPU hours by 37.6% compared with PolyBERT without BCL.
Problem

Research questions and friction points this paper is trying to address.

Balancing token-level and sequence-level semantics for better WSD
Reducing computational costs by eliminating redundant sense inputs
Improving WSD accuracy and efficiency with PolyBERT and BCL
Innovation

Methods, ideas, or system contributions that make the work stand out.

Poly-encoder BERT balances local and global semantics
Batch Contrastive Learning reduces computational costs
Multi-head attention enriches semantic representation
🔎 Similar Papers
No similar papers found.
Linhan Xia
Linhan Xia
University of Oklahoma
Natural Language ProcessingDeep LearningArtificial Intelligence
M
Mingzhan Yang
ICNLab, Shenzhen Graduate School, Peking University, Shenzhen, P.R.China
G
Guohui Yuan
ICNLab, Shenzhen Graduate School, Peking University, Shenzhen, P.R.China
S
Shengnan Tao
ICNLab, Shenzhen Graduate School, Peking University, Shenzhen, P.R.China
Y
Yujing Qiu
ICNLab, Shenzhen Graduate School, Peking University, Shenzhen, P.R.China
Guo Yu
Guo Yu
University of California, Santa Barbara
High-dimensional statisticsStatistical Machine Learning
Kai Lei
Kai Lei
Research Professor, Peking University, Shenzhen Graduate School
Future InternetData MiningBlockchain