Spanning the Visual Analogy Space with a Weight Basis of LoRAs

πŸ“… 2026-02-17
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing methods relying on a single LoRA module struggle to generalize across diverse visual analogy transformation tasks. To address this limitation, this work proposes LoRWeB, a novel framework that, for the first time, decomposes LoRA into learnable, interpolable, and composable transformation bases. A lightweight dynamic weight encoder is introduced to flexibly combine these bases at inference time, enabling adaptation to arbitrary visual analogy tasks. The proposed approach significantly enhances the model’s generalization capability to unseen transformations and achieves state-of-the-art performance on visual analogy benchmarks, thereby demonstrating the effectiveness of decomposing LoRA into modular bases for flexible visual manipulation.

Technology Category

Application Category

πŸ“ Abstract
Visual analogy learning enables image manipulation through demonstration rather than textual description, allowing users to specify complex transformations difficult to articulate in words. Given a triplet $\{\mathbf{a}$, $\mathbf{a}'$, $\mathbf{b}\}$, the goal is to generate $\mathbf{b}'$ such that $\mathbf{a} : \mathbf{a}' :: \mathbf{b} : \mathbf{b}'$. Recent methods adapt text-to-image models to this task using a single Low-Rank Adaptation (LoRA) module, but they face a fundamental limitation: attempting to capture the diverse space of visual transformations within a fixed adaptation module constrains generalization capabilities. Inspired by recent work showing that LoRAs in constrained domains span meaningful, interpolatable semantic spaces, we propose LoRWeB, a novel approach that specializes the model for each analogy task at inference time through dynamic composition of learned transformation primitives, informally, choosing a point in a "space of LoRAs". We introduce two key components: (1) a learnable basis of LoRA modules, to span the space of different visual transformations, and (2) a lightweight encoder that dynamically selects and weighs these basis LoRAs based on the input analogy pair. Comprehensive evaluations demonstrate our approach achieves state-of-the-art performance and significantly improves generalization to unseen visual transformations. Our findings suggest that LoRA basis decompositions are a promising direction for flexible visual manipulation. Code and data are in https://research.nvidia.com/labs/par/lorweb
Problem

Research questions and friction points this paper is trying to address.

visual analogy learning
Low-Rank Adaptation
generalization
image manipulation
LoRA
Innovation

Methods, ideas, or system contributions that make the work stand out.

LoRA basis
visual analogy
dynamic composition
low-rank adaptation
generalization
πŸ”Ž Similar Papers