Explicit Dropout: Deterministic Regularization for Transformer Architectures

📅 2026-04-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

199K/year
🤖 AI Summary
This work proposes a deterministic Dropout framework that reformulates traditional Dropout—typically reliant on random masking and lacking an explicit optimization objective—as an explicit regularization term directly incorporated into the loss function. The approach eliminates stochastic perturbations and enables fine-grained, module-wise control over regularization strength, making it particularly suitable for Transformer architectures by separately regularizing attention components (queries, keys, values) and feed-forward networks. Owing to its determinism and interpretability, the method achieves stable and efficient performance across diverse tasks, including image classification, temporal action detection, and audio classification, matching or surpassing conventional Dropout while offering superior controllability and theoretical clarity.

Technology Category

Application Category

📝 Abstract
Dropout is a widely used regularization technique in deep learning, but its effects are typically realized through stochastic masking rather than explicit optimization objectives. We propose a deterministic formulation that expresses dropout as an additive regularizer directly incorporated into the training loss. The framework derives explicit regularization terms for Transformer architectures, covering attention query, key, value, and feed-forward components with independently controllable strengths. This formulation removes reliance on stochastic perturbations while providing clearer and fine-grained control over regularization strength. Experiments across image classification, temporal action detection, and audio classification show that explicit dropout matches or outperforms conventional implicit methods, with consistent gains when applied to attention and feed-forward network layers. Ablation studies demonstrate stable performance and controllable regularization through regularization coefficients and dropout rates. Overall, explicit dropout offers a practical and interpretable alternative to stochastic regularization while maintaining architectural flexibility across diverse tasks.
Problem

Research questions and friction points this paper is trying to address.

Dropout
Regularization
Transformer
Deterministic
Stochastic
Innovation

Methods, ideas, or system contributions that make the work stand out.

Explicit Dropout
Deterministic Regularization
Transformer
Additive Regularizer
Attention Mechanism
🔎 Similar Papers
No similar papers found.