Projectable Models: One-Shot Generation of Small Specialized Transformers from Large Ones

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of deploying large Transformer models on resource-constrained devices—namely, high computational overhead and redundancy of task-agnostic knowledge—this paper proposes a **one-shot, task-oriented parameter projection method** that directly maps pretrained model parameters to compact, task-specific models without fine-tuning or additional training data. The core innovation lies in a learnable, conditional parameter projection network that extracts semantically relevant knowledge subsets and achieves structured parameter compression guided by task semantics. Evaluated on image modeling tasks, the resulting lightweight models outperform general-purpose conditional baselines in accuracy while accelerating inference by 3.2× and reducing parameter count by 91%. This approach effectively balances predictive performance, computational efficiency, and practical deployability.

Technology Category

Application Category

📝 Abstract
Modern Foundation Models (FMs) are typically trained on corpora spanning a wide range of different data modalities, topics and downstream tasks. Utilizing these models can be very computationally expensive and is out of reach for most consumer devices. Furthermore, most of the broad FM knowledge may actually be irrelevant for a specific task at hand. Here we explore a technique for mapping parameters of a large Transformer to parameters of a smaller specialized model. By making this transformation task-specific, we aim to capture a narrower scope of the knowledge needed for performing a specific task by a smaller model. We study our method on image modeling tasks, showing that performance of generated models exceeds that of universal conditional models.
Problem

Research questions and friction points this paper is trying to address.

Reduce computational cost of large Transformers
Create smaller task-specific models
Improve performance on specialized tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-shot generation of small specialized Transformers
Mapping large Transformer to smaller model parameters
Task-specific transformation for narrower knowledge scope
🔎 Similar Papers
No similar papers found.
Andrey Zhmoginov
Andrey Zhmoginov
Google DeepMind
Plasma PhysicsMachine Learning
J
Jihwan Lee
Google DeepMind
M
Mark Sandler
Google DeepMind