Explicit Dropout: Deterministic Regularization for Transformer Architectures

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

This work proposes a deterministic Dropout framework that reformulates traditional Dropout—typically reliant on random masking and lacking an explicit optimization objective—as an explicit regularization term directly incorporated into the loss function. The approach eliminates stochastic perturbations and enables fine-grained, module-wise control over regularization strength, making it particularly suitable for Transformer architectures by separately regularizing attention components (queries, keys, values) and feed-forward networks. Owing to its determinism and interpretability, the method achieves stable and efficient performance across diverse tasks, including image classification, temporal action detection, and audio classification, matching or surpassing conventional Dropout while offering superior controllability and theoretical clarity.

Technology Category

Application Category

📝 Abstract

Dropout is a widely used regularization technique in deep learning, but its effects are typically realized through stochastic masking rather than explicit optimization objectives. We propose a deterministic formulation that expresses dropout as an additive regularizer directly incorporated into the training loss. The framework derives explicit regularization terms for Transformer architectures, covering attention query, key, value, and feed-forward components with independently controllable strengths. This formulation removes reliance on stochastic perturbations while providing clearer and fine-grained control over regularization strength. Experiments across image classification, temporal action detection, and audio classification show that explicit dropout matches or outperforms conventional implicit methods, with consistent gains when applied to attention and feed-forward network layers. Ablation studies demonstrate stable performance and controllable regularization through regularization coefficients and dropout rates. Overall, explicit dropout offers a practical and interpretable alternative to stochastic regularization while maintaining architectural flexibility across diverse tasks.

Problem

Research questions and friction points this paper is trying to address.

Dropout

Regularization

Transformer

Deterministic

Stochastic

Innovation

Methods, ideas, or system contributions that make the work stand out.

Explicit Dropout

Deterministic Regularization

Transformer