HyperTransport: Amortized Conditioning of T2I Generative Models

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Existing activation-based intervention methods for text-to-image generation require per-concept optimization, limiting their applicability to open or dynamic concept sets. This work proposes HyperTransport, a framework that leverages a hypernetwork to directly map CLIP embeddings to intervention parameters, trained end-to-end with an optimal transport loss. HyperTransport enables single forward-pass generation of interventions for arbitrary novel concepts, unifying amortized intervention for open concept sets, continuously controllable interpretability strength, and cross-modal image-guided text generation for the first time. Experiments on DMD2 and Nitro-1-PixArt demonstrate that HyperTransport achieves generation quality comparable to per-concept optimization baselines across 167 unseen concepts, while accelerating inference by 3,600–7,000× and obtaining approximately twice the human and vision-language model (VLM) preference over prompt engineering.

📝 Abstract

As foundation models grow in capability, the ability to efficiently and reliably control their behavior becomes critical. Fine-tuning these models can be costly, and while prompting can be practical for controllability, it remains fragile due to models' high sensitivity to exact prompt wording and structure. This brittleness has driven interest in activation steering techniques that offer more stable and predictable control over model behavior. However, existing activation steering methods require per-concept optimization, which makes them ill-suited to deployment scenarios where the concept set is large, evolving, or only specified at request time: each new concept incurs at least minutes of optimization on the target model. We propose HyperTransport, a hypernetwork framework that amortizes this cost by mapping embeddings from a pretrained encoder (CLIP in our instantiation) directly to intervention parameters, trained end-to-end using an optimal transport loss. Once trained, HyperTransport produces each new intervention in a single hypernetwork forward pass, 3600-7000x faster than per-concept fitting. On concepts unseen during training, it matches the strongest per-concept baselines at inducing the target concept. By decoupling concept representation from intervention prediction, HyperTransport combines three capabilities that no existing approach offers as a set: amortized steering for open-ended concept sets, continuous interpretable strength control, and cross-modal conditioning where reference images can directly steer text-based generation. We validate HyperTransport on DMD2 and Nitro-1-PixArt across 167 held-out test concepts via CLIP-based metrics, a VLM-as-a-judge evaluation, and a user study. In pairwise comparisons, both human and VLM judges prefer HyperTransport over prompting ~2x as often.

Problem

Research questions and friction points this paper is trying to address.

activation steering

text-to-image generation

concept conditioning

amortized control

generative models

Innovation

Methods, ideas, or system contributions that make the work stand out.

HyperTransport

activation steering

hypernetwork