ExLM: Rethinking the Impact of $ exttt{[MASK]}$ Tokens in Masked Language Models

📅 2025-01-23

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work identifies a semantic corruption problem in masked language modeling (MLM) induced by the [MASK] token: its forced replacement exacerbates contextual ambiguity and increases representation multimodality, thereby degrading downstream task performance. To address this, we propose ExLM, a context-enhanced MLM framework whose core innovation is explicit expansion modeling of the [MASK] state—integrating multi-granularity dependency modeling and semantic consistency regularization to strengthen contextual awareness. Experiments demonstrate that ExLM significantly outperforms strong baselines—including BERT and RoBERTa—on both text understanding and molecular SMILES modeling tasks. It effectively mitigates semantic ambiguity, reduces representation uncertainty, and improves accuracy and robustness across diverse downstream applications.

Technology Category

Application Category

📝 Abstract

Masked Language Models (MLMs) have achieved remarkable success in many self-supervised representation learning tasks. MLMs are trained by randomly replacing some tokens in the input sentences with $ exttt{[MASK]}$ tokens and predicting the original tokens based on the remaining context. This paper explores the impact of $ exttt{[MASK]}$ tokens on MLMs. Analytical studies show that masking tokens can introduce the corrupted semantics problem, wherein the corrupted context may convey multiple, ambiguous meanings. This problem is also a key factor affecting the performance of MLMs on downstream tasks. Based on these findings, we propose a novel enhanced-context MLM, ExLM. Our approach expands $ exttt{[MASK]}$ tokens in the input context and models the dependencies between these expanded states. This expansion increases context capacity and enables the model to capture richer semantic information, effectively mitigating the corrupted semantics problem during pre-training. Experimental results demonstrate that ExLM achieves significant performance improvements in both text modeling and SMILES modeling tasks. Further analysis confirms that ExLM enhances semantic representations through context enhancement, and effectively reduces the multimodality problem commonly observed in MLMs.

Problem

Research questions and friction points this paper is trying to address.

Masked Language Models

Polysemy

Semantic Ambiguity

Innovation

Methods, ideas, or system contributions that make the work stand out.

ExLM Model

Mask Token Enhancement

Contextual Information Utilization

🔎 Similar Papers

No similar papers found.