Regularizing Differentiable Architecture Search with Smooth Activation

📅 2025-04-22

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the performance collapse, poor generalization, and discretization bias in differentiable architecture search (DARTS) caused by skip-connection dominance. We propose Smooth Activation Regularization (SAR), a lightweight auxiliary loss applied directly to architecture weights—without modifying the supernet structure or injecting noise—that employs smooth activation functions (e.g., Softplus) to encourage balanced weight divergence and align optimization dynamics with the discrete architecture selection objective. SAR is theoretically grounded, ensuring convergence guarantees. Evaluated on NAS-Bench-201, image classification, and single-image super-resolution tasks, SAR achieves new state-of-the-art (SOTA) performance. Notably, it significantly improves both accuracy and parameter efficiency of the Information Multi-distillation Network, demonstrating strong robustness and cross-task generalization capability.

Technology Category

Application Category

📝 Abstract

Differentiable Architecture Search (DARTS) is an efficient Neural Architecture Search (NAS) method but suffers from robustness, generalization, and discrepancy issues. Many efforts have been made towards the performance collapse issue caused by skip dominance with various regularization techniques towards operation weights, path weights, noise injection, and super-network redesign. It had become questionable at a certain point if there could exist a better and more elegant way to retract the search to its intended goal -- NAS is a selection problem. In this paper, we undertake a simple but effective approach, named Smooth Activation DARTS (SA-DARTS), to overcome skip dominance and discretization discrepancy challenges. By leveraging a smooth activation function on architecture weights as an auxiliary loss, our SA-DARTS mitigates the unfair advantage of weight-free operations, converging to fanned-out architecture weight values, and can recover the search process from skip-dominance initialization. Through theoretical and empirical analysis, we demonstrate that the SA-DARTS can yield new state-of-the-art (SOTA) results on NAS-Bench-201, classification, and super-resolution. Further, we show that SA-DARTS can help improve the performance of SOTA models with fewer parameters, such as Information Multi-distillation Network on the super-resolution task.

Problem

Research questions and friction points this paper is trying to address.

Addresses robustness and generalization issues in DARTS

Mitigates skip dominance in neural architecture search

Improves performance with fewer parameters in SOTA models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Smooth activation function on architecture weights

Mitigates unfair advantage of weight-free operations

Converges to fanned-out architecture weight values

🔎 Similar Papers

No similar papers found.