🤖 AI Summary
Conventional neural network design relies on manual hyperparameter tuning or sequential neural architecture search (NAS) followed by weight training, leading to inefficiency and a fundamental decoupling of architecture and parameter optimization.
Method: We propose the first end-to-end joint optimization framework: a multi-scale universal autoencoder maps discrete architectures and their weights jointly into a continuous latent space; structural-aware embedding and differentiable sparse regularization (L0/L1) enable gradient-driven co-optimization of both architecture and weights.
Contribution/Results: Our approach breaks the traditional NAS-training separation paradigm. On synthetic regression tasks, it automatically discovers high-performance, lightweight, sparse models—outperforming staged baselines significantly. Empirical results demonstrate superior trade-offs among model compactness, accuracy, and generalization, validating the holistic advantages of joint architecture-parameter optimization.
📝 Abstract
Designing neural networks typically relies on manual trial and error or a neural architecture search (NAS) followed by weight training. The former is time-consuming and labor-intensive, while the latter often discretizes architecture search and weight optimization. In this paper, we propose a fundamentally different approach that simultaneously optimizes both the architecture and the weights of a neural network. Our framework first trains a universal multi-scale autoencoder that embeds both architectural and parametric information into a continuous latent space, where functionally similar neural networks are mapped closer together. Given a dataset, we then randomly initialize a point in the embedding space and update it via gradient descent to obtain the optimal neural network, jointly optimizing its structure and weights. The optimization process incorporates sparsity and compactness penalties to promote efficient models. Experiments on synthetic regression tasks demonstrate that our method effectively discovers sparse and compact neural networks with strong performance.