Neural Architecture Search by Learning a Hierarchical Search Space

📅 2025-03-27

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

In neural architecture search (NAS), Monte Carlo Tree Search (MCTS) suffers from exponential degradation in search efficiency due to misleading initial branching decisions. Method: This paper proposes a hierarchical clustering approach based on architectural output embedding distances to adaptively construct a layered MCTS branching structure, thereby dynamically optimizing the search order. It formulates the non-differentiable, high-cost NAS problem as a hierarchical decision process and, for the first time, incorporates architectural similarity into MCTS tree design to mitigate early misjudgment-induced exploration bias. Contribution/Results: Experiments on CIFAR-10 and ImageNet demonstrate that our method consistently outperforms mainstream differentiable and reinforcement-learning-based NAS approaches—including DARTS and ENAS—with significantly fewer architecture evaluations. The results validate that hierarchical search space construction is critical for improving both convergence efficiency and generalization performance of MCTS in NAS.

Technology Category

Application Category

📝 Abstract

Monte-Carlo Tree Search (MCTS) is a powerful tool for many non-differentiable search related problems such as adversarial games. However, the performance of such approach highly depends on the order of the nodes that are considered at each branching of the tree. If the first branches cannot distinguish between promising and deceiving configurations for the final task, the efficiency of the search is exponentially reduced. In Neural Architecture Search (NAS), as only the final architecture matters, the visiting order of the branching can be optimized to improve learning. In this paper, we study the application of MCTS to NAS for image classification. We analyze several sampling methods and branching alternatives for MCTS and propose to learn the branching by hierarchical clustering of architectures based on their similarity. The similarity is measured by the pairwise distance of output vectors of architectures. Extensive experiments on two challenging benchmarks on CIFAR10 and ImageNet show that MCTS, if provided with a good branching hierarchy, can yield promising solutions more efficiently than other approaches for NAS problems.

Problem

Research questions and friction points this paper is trying to address.

Optimizing node order in MCTS for Neural Architecture Search

Learning branching hierarchy via architecture similarity clustering

Improving NAS efficiency on image classification benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical clustering for architecture similarity

MCTS with optimized branching order

Output vector distance measurement

🔎 Similar Papers

Optimizing Time Series Forecasting Architectures: A Hierarchical Neural Architecture Search Approach