SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Conventional neural network design relies on manual hyperparameter tuning or sequential neural architecture search (NAS) followed by weight training, leading to inefficiency and a fundamental decoupling of architecture and parameter optimization. Method: We propose the first end-to-end joint optimization framework: a multi-scale universal autoencoder maps discrete architectures and their weights jointly into a continuous latent space; structural-aware embedding and differentiable sparse regularization (L0/L1) enable gradient-driven co-optimization of both architecture and weights. Contribution/Results: Our approach breaks the traditional NAS-training separation paradigm. On synthetic regression tasks, it automatically discovers high-performance, lightweight, sparse models—outperforming staged baselines significantly. Empirical results demonstrate superior trade-offs among model compactness, accuracy, and generalization, validating the holistic advantages of joint architecture-parameter optimization.

Technology Category

Application Category

📝 Abstract

Designing neural networks typically relies on manual trial and error or a neural architecture search (NAS) followed by weight training. The former is time-consuming and labor-intensive, while the latter often discretizes architecture search and weight optimization. In this paper, we propose a fundamentally different approach that simultaneously optimizes both the architecture and the weights of a neural network. Our framework first trains a universal multi-scale autoencoder that embeds both architectural and parametric information into a continuous latent space, where functionally similar neural networks are mapped closer together. Given a dataset, we then randomly initialize a point in the embedding space and update it via gradient descent to obtain the optimal neural network, jointly optimizing its structure and weights. The optimization process incorporates sparsity and compactness penalties to promote efficient models. Experiments on synthetic regression tasks demonstrate that our method effectively discovers sparse and compact neural networks with strong performance.

Problem

Research questions and friction points this paper is trying to address.

Simultaneously optimizes neural network architecture and weights

Embeds architectural and parametric info into continuous latent space

Promotes efficient models via sparsity and compactness penalties

Innovation

Methods, ideas, or system contributions that make the work stand out.

Simultaneous architecture and weight optimization

Latent space embedding for neural networks

Sparsity and compactness penalties for efficiency

🔎 Similar Papers

Optimizing Time Series Forecasting Architectures: A Hierarchical Neural Architecture Search Approach