MID-L: Matrix-Interpolated Dropout Layer with Layer-wise Neuron Selection

šŸ“… 2025-05-16
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
Modern neural networks suffer from computational redundancy due to uniform neuron activation across all inputs. To address this, we propose MID-L—a lightweight, input-adaptive, model-agnostic dynamic sparsification module. MID-L employs differentiable Top-k selection and input-conditioned gating to generate learnable masks, enabling per-sample dynamic activation of critical neurons via matrix interpolation between dual pathways. It is the first method to jointly achieve dynamic sparsity, end-to-end differentiability, and architectural generality. Leveraging mutual information-driven sparse optimization and FLOPs-aware design, MID-L reduces average neuron activation by 55% and inference computation by 1.7Ɨ across six benchmarks, while maintaining or improving accuracy. Moreover, it significantly enhances generalization performance and robustness to input noise.

Technology Category

Application Category

šŸ“ Abstract
Modern neural networks often activate all neurons for every input, leading to unnecessary computation and inefficiency. We introduce Matrix-Interpolated Dropout Layer (MID-L), a novel module that dynamically selects and activates only the most informative neurons by interpolating between two transformation paths via a learned, input-dependent gating vector. Unlike conventional dropout or static sparsity methods, MID-L employs a differentiable Top-k masking strategy, enabling per-input adaptive computation while maintaining end-to-end differentiability. MID-L is model-agnostic and integrates seamlessly into existing architectures. Extensive experiments on six benchmarks, including MNIST, CIFAR-10, CIFAR-100, SVHN, UCI Adult, and IMDB, show that MID-L achieves up to average 55% reduction in active neurons, 1.7$ imes$ FLOPs savings, and maintains or exceeds baseline accuracy. We further validate the informativeness and selectivity of the learned neurons via Sliced Mutual Information (SMI) and observe improved robustness under overfitting and noisy data conditions. Additionally, MID-L demonstrates favorable inference latency and memory usage profiles, making it suitable for both research exploration and deployment on compute-constrained systems. These results position MID-L as a general-purpose, plug-and-play dynamic computation layer, bridging the gap between dropout regularization and efficient inference.
Problem

Research questions and friction points this paper is trying to address.

Dynamic neuron selection to reduce unnecessary computation
Differentiable Top-k masking for adaptive per-input activation
Improving efficiency while maintaining or exceeding baseline accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic neuron selection via input-dependent gating vector
Differentiable Top-k masking for adaptive computation
Model-agnostic integration with existing architectures
šŸ”Ž Similar Papers
No similar papers found.