SliderSpace: Decomposing the Visual Capabilities of Diffusion Models

📅 2025-02-03

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the challenge of interpreting and controllably editing diffusion models’ visual representations. We propose a single-text-prompt-driven latent-space disentanglement method that jointly optimizes low-rank adapters (LoRA) and text-guided direction discovery to automatically extract multiple semantically distinct, composable, and interpretable editing directions—achieving, for the first time, simultaneous multi-dimensional interpretable representation learning from a single prompt. Our approach uncovers fine-grained semantic structures implicitly encoded in the diffusion latent space and establishes a unified framework for concept disentanglement optimization and controllable generation evaluation. Extensive experiments demonstrate superior performance over baselines across three tasks: concept decomposition, artistic style transfer, and diversity enhancement. Quantitative analysis confirms effective decomposition of the model’s visual knowledge along each learned direction, while user studies validate significantly improved richness and practical utility of the generated outputs.

Technology Category

Application Category

📝 Abstract

We present SliderSpace, a framework for automatically decomposing the visual capabilities of diffusion models into controllable and human-understandable directions. Unlike existing control methods that require a user to specify attributes for each edit direction individually, SliderSpace discovers multiple interpretable and diverse directions simultaneously from a single text prompt. Each direction is trained as a low-rank adaptor, enabling compositional control and the discovery of surprising possibilities in the model's latent space. Through extensive experiments on state-of-the-art diffusion models, we demonstrate SliderSpace's effectiveness across three applications: concept decomposition, artistic style exploration, and diversity enhancement. Our quantitative evaluation shows that SliderSpace-discovered directions decompose the visual structure of model's knowledge effectively, offering insights into the latent capabilities encoded within diffusion models. User studies further validate that our method produces more diverse and useful variations compared to baselines. Our code, data and trained weights are available at https://sliderspace.baulab.info

Problem

Research questions and friction points this paper is trying to address.

Complex Image Models

Interpretable Decomposition

Model Understandability

Innovation

Methods, ideas, or system contributions that make the work stand out.

SliderSpace

Diffusion Model

Controlled Image Generation

🔎 Similar Papers

No similar papers found.