🤖 AI Summary
To address the challenges of deploying large Transformer models on resource-constrained devices—namely, high computational overhead and redundancy of task-agnostic knowledge—this paper proposes a **one-shot, task-oriented parameter projection method** that directly maps pretrained model parameters to compact, task-specific models without fine-tuning or additional training data. The core innovation lies in a learnable, conditional parameter projection network that extracts semantically relevant knowledge subsets and achieves structured parameter compression guided by task semantics. Evaluated on image modeling tasks, the resulting lightweight models outperform general-purpose conditional baselines in accuracy while accelerating inference by 3.2× and reducing parameter count by 91%. This approach effectively balances predictive performance, computational efficiency, and practical deployability.
📝 Abstract
Modern Foundation Models (FMs) are typically trained on corpora spanning a wide range of different data modalities, topics and downstream tasks. Utilizing these models can be very computationally expensive and is out of reach for most consumer devices. Furthermore, most of the broad FM knowledge may actually be irrelevant for a specific task at hand. Here we explore a technique for mapping parameters of a large Transformer to parameters of a smaller specialized model. By making this transformation task-specific, we aim to capture a narrower scope of the knowledge needed for performing a specific task by a smaller model. We study our method on image modeling tasks, showing that performance of generated models exceeds that of universal conditional models.