Instruction-Guided Autoregressive Neural Network Parameter Generation

📅 2025-04-02

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Existing neural network parameter generation methods suffer from poor scalability to large-scale architectures, inconsistent layer-wise weights, and inflexible depth adaptation. To address these issues, this paper proposes an instruction-guided autoregressive parameter generation paradigm. Our method conditions the generation process on three inputs: task description, data characteristics, and network architecture. It employs a VQ-VAE encoder to compress the weight space and leverages an autoregressive Transformer to model token-level inter-layer consistency, enabling flexible depth adaptation and efficient cross-task transfer. Compared with state-of-the-art approaches, our method achieves comparable or superior parameter synthesis performance across multiple vision benchmarks. It significantly improves generation efficiency and scalability for large models—e.g., ViT-L and ResNet-101—while enabling rapid fine-tuning, model retrieval, and on-demand parameter synthesis. This work establishes a new paradigm for lightweight, adaptive, and scalable neural network parameter generation.

Technology Category

Application Category

📝 Abstract

Learning to generate neural network parameters conditioned on task descriptions and architecture specifications is pivotal for advancing model adaptability and transfer learning. Existing methods especially those based on diffusion models suffer from limited scalability to large architectures, rigidity in handling varying network depths, and disjointed parameter generation that undermines inter-layer coherence. In this work, we propose IGPG (Instruction Guided Parameter Generation), an autoregressive framework that unifies parameter synthesis across diverse tasks and architectures. IGPG leverages a VQ-VAE and an autoregressive model to generate neural network parameters, conditioned on task instructions, dataset, and architecture details. By autoregressively generating neural network weights' tokens, IGPG ensures inter-layer coherence and enables efficient adaptation across models and datasets. Operating at the token level, IGPG effectively captures complex parameter distributions aggregated from a broad spectrum of pretrained models. Extensive experiments on multiple vision datasets demonstrate that IGPG consolidates diverse pretrained models into a single, flexible generative framework. The synthesized parameters achieve competitive or superior performance relative to state-of-the-art methods, especially in terms of scalability and efficiency when applied to large architectures. These results underscore ICPG potential as a powerful tool for pretrained weight retrieval, model selection, and rapid task-specific fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Generates neural network parameters using task instructions and architecture details

Ensures inter-layer coherence in parameter generation across diverse tasks

Improves scalability and efficiency for large architectures compared to diffusion models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoregressive neural network parameter generation

VQ-VAE and autoregressive model integration

Token-level weight synthesis for coherence

🔎 Similar Papers

WAVE: Weighted Autoregressive Varing Gate for Time Series Forecasting