🤖 AI Summary
To address the lack of universality and interpretability in latent variable manipulation for generative models, this paper proposes Linear Latent Composition (LOL). LOL establishes the first modality-agnostic and architecture-agnostic framework enabling arbitrary linear operations—including interpolation, subspace construction, and low-dimensional representation extraction—without additional training or fine-tuning. Its core innovation lies in applying geometrically consistent linear transformations within the latent spaces of mainstream generative paradigms, including diffusion models, flow matching, and continuous normalizing flows. Unlike existing approaches constrained to specific architectures or data modalities, LOL significantly enhances the flexibility and reproducibility of controllable generation. Empirical evaluations demonstrate its strong generalization and practical utility across diverse applications: synthetic data generation, data augmentation, and multimodal experimental design.
📝 Abstract
Sampling from generative models has become a crucial tool for applications like data synthesis and augmentation. Diffusion, Flow Matching and Continuous Normalizing Flows have shown effectiveness across various modalities, and rely on latent variables for generation. For experimental design or creative applications that require more control over the generation process, it has become common to manipulate the latent variable directly. However, existing approaches for performing such manipulations (e.g. interpolation or forming low-dimensional representations) only work well in special cases or are network or data-modality specific. We propose Linear combinations of Latent variables (LOL) as a general-purpose method to form linear combinations of latent variables that adhere to the assumptions of the generative model. As LOL is easy to implement and naturally addresses the broader task of forming any linear combinations, e.g. the construction of subspaces of the latent space, LOL dramatically simplifies the creation of expressive low-dimensional representations of high-dimensional objects.