Architecture-Aware Minimization (A$^2$M): How to Find Flat Minima in Neural Architecture Search

📅 2025-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor generalization of differentiable Neural Architecture Search (NAS), this work identifies that high-accuracy architectures tend to concentrate in flat regions of the architecture space—introducing the concept of “flatness” to NAS for the first time—and proposes the A²M framework. A²M analytically derives a gradient correction mechanism targeting flat minima in the architecture space, and designs a differentiable gradient reweighting algorithm jointly driven by path-loss barriers and neighborhood flatness, explicitly guiding optimization toward flat minima. Compatible with mainstream differentiable NAS frameworks, A²M is validated on NAS-Bench-201 and the DARTS search space, achieving absolute test accuracy improvements of 3.60%, 4.60%, and 3.64% on CIFAR-10, CIFAR-100, and ImageNet16-120, respectively. These results demonstrate substantial gains in model generalization. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Neural Architecture Search (NAS) has become an essential tool for designing effective and efficient neural networks. In this paper, we investigate the geometric properties of neural architecture spaces commonly used in differentiable NAS methods, specifically NAS-Bench-201 and DARTS. By defining flatness metrics such as neighborhoods and loss barriers along paths in architecture space, we reveal locality and flatness characteristics analogous to the well-known properties of neural network loss landscapes in weight space. In particular, we find that highly accurate architectures cluster together in flat regions, while suboptimal architectures remain isolated, unveiling the detailed geometrical structure of the architecture search landscape. Building on these insights, we propose Architecture-Aware Minimization (A$^2$M), a novel analytically derived algorithmic framework that explicitly biases, for the first time, the gradient of differentiable NAS methods towards flat minima in architecture space. A$^2$M consistently improves generalization over state-of-the-art DARTS-based algorithms on benchmark datasets including CIFAR-10, CIFAR-100, and ImageNet16-120, across both NAS-Bench-201 and DARTS search spaces. Notably, A$^2$M is able to increase the test accuracy, on average across different differentiable NAS methods, by +3.60% on CIFAR-10, +4.60% on CIFAR-100, and +3.64% on ImageNet16-120, demonstrating its superior effectiveness in practice. A$^2$M can be easily integrated into existing differentiable NAS frameworks, offering a versatile tool for future research and applications in automated machine learning. We open-source our code at https://github.com/AI-Tech-Research-Lab/AsquaredM.
Problem

Research questions and friction points this paper is trying to address.

Explores geometric properties of neural architecture spaces.
Proposes Architecture-Aware Minimization (A$^2$M) for flat minima.
Improves generalization in NAS across benchmark datasets.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Defines flatness metrics in architecture space
Proposes Architecture-Aware Minimization (A$^2$M)
Biases NAS gradients towards flat minima
🔎 Similar Papers
No similar papers found.