HyperTransport: Amortized Conditioning of T2I Generative Models

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

209K/year
🤖 AI Summary
Existing activation-based intervention methods for text-to-image generation require per-concept optimization, limiting their applicability to open or dynamic concept sets. This work proposes HyperTransport, a framework that leverages a hypernetwork to directly map CLIP embeddings to intervention parameters, trained end-to-end with an optimal transport loss. HyperTransport enables single forward-pass generation of interventions for arbitrary novel concepts, unifying amortized intervention for open concept sets, continuously controllable interpretability strength, and cross-modal image-guided text generation for the first time. Experiments on DMD2 and Nitro-1-PixArt demonstrate that HyperTransport achieves generation quality comparable to per-concept optimization baselines across 167 unseen concepts, while accelerating inference by 3,600–7,000× and obtaining approximately twice the human and vision-language model (VLM) preference over prompt engineering.
📝 Abstract
As foundation models grow in capability, the ability to efficiently and reliably control their behavior becomes critical. Fine-tuning these models can be costly, and while prompting can be practical for controllability, it remains fragile due to models' high sensitivity to exact prompt wording and structure. This brittleness has driven interest in activation steering techniques that offer more stable and predictable control over model behavior. However, existing activation steering methods require per-concept optimization, which makes them ill-suited to deployment scenarios where the concept set is large, evolving, or only specified at request time: each new concept incurs at least minutes of optimization on the target model. We propose HyperTransport, a hypernetwork framework that amortizes this cost by mapping embeddings from a pretrained encoder (CLIP in our instantiation) directly to intervention parameters, trained end-to-end using an optimal transport loss. Once trained, HyperTransport produces each new intervention in a single hypernetwork forward pass, 3600-7000x faster than per-concept fitting. On concepts unseen during training, it matches the strongest per-concept baselines at inducing the target concept. By decoupling concept representation from intervention prediction, HyperTransport combines three capabilities that no existing approach offers as a set: amortized steering for open-ended concept sets, continuous interpretable strength control, and cross-modal conditioning where reference images can directly steer text-based generation. We validate HyperTransport on DMD2 and Nitro-1-PixArt across 167 held-out test concepts via CLIP-based metrics, a VLM-as-a-judge evaluation, and a user study. In pairwise comparisons, both human and VLM judges prefer HyperTransport over prompting ~2x as often.
Problem

Research questions and friction points this paper is trying to address.

activation steering
text-to-image generation
concept conditioning
amortized control
generative models
Innovation

Methods, ideas, or system contributions that make the work stand out.

HyperTransport
activation steering
hypernetwork
optimal transport
amortized conditioning
🔎 Similar Papers
No similar papers found.