Discovering Latent Graphs with GFlowNets for Diverse Conditional Image Generation

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address limited diversity in conditional image generation caused by insufficient modeling of input uncertainty, this paper proposes Rainbow: the first framework to integrate Generative Flow Networks (GFlowNets) into conditional generation. Rainbow constructs a parameterized latent graph to encode multimodal, implicit semantic representations of the condition and samples diverse trajectories in the latent space, thereby guiding pre-trained generative models to produce rich, semantically coherent, and interpretable images. Crucially, it disentangles uncertainty directly from a single condition—without relying on random seed perturbation or prompt engineering. Experiments on natural and medical imaging datasets demonstrate that Rainbow significantly improves diversity (+32.7% LPIPS) and fidelity (+18.4% FID) across image synthesis, generation, and counterfactual reasoning tasks. This work establishes a novel paradigm for uncertainty-aware, controllable generation.

Technology Category

Application Category

📝 Abstract
Capturing diversity is crucial in conditional and prompt-based image generation, particularly when conditions contain uncertainty that can lead to multiple plausible outputs. To generate diverse images reflecting this diversity, traditional methods often modify random seeds, making it difficult to discern meaningful differences between samples, or diversify the input prompt, which is limited in verbally interpretable diversity. We propose Rainbow, a novel conditional image generation framework, applicable to any pretrained conditional generative model, that addresses inherent condition/prompt uncertainty and generates diverse plausible images. Rainbow is based on a simple yet effective idea: decomposing the input condition into diverse latent representations, each capturing an aspect of the uncertainty and generating a distinct image. First, we integrate a latent graph, parameterized by Generative Flow Networks (GFlowNets), into the prompt representation computation. Second, leveraging GFlowNets' advanced graph sampling capabilities to capture uncertainty and output diverse trajectories over the graph, we produce multiple trajectories that collectively represent the input condition, leading to diverse condition representations and corresponding output images. Evaluations on natural image and medical image datasets demonstrate Rainbow's improvement in both diversity and fidelity across image synthesis, image generation, and counterfactual generation tasks.
Problem

Research questions and friction points this paper is trying to address.

Generating diverse images from uncertain input conditions
Decomposing input conditions into distinct latent representations
Improving diversity and fidelity in conditional image generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposing input conditions into diverse latent representations
Integrating GFlowNet-parameterized latent graphs for prompt computation
Leveraging GFlowNet sampling for diverse trajectory generation
🔎 Similar Papers
No similar papers found.
B
Bailey Trang
Dept. of Computer Science, Stanford University, Stanford, CA, USA
Parham Saremi
Parham Saremi
ECE student at McGill
Machine LearningMedical ImagingComputer VisionGenerative Modeling
A
Alan Q. Wang
Dept. of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA
F
Fangrui Huang
Dept. of Computer Science, Stanford University, Stanford, CA, USA
Zahra TehraniNasab
Zahra TehraniNasab
Graduate Student, McGill University
Computer VisionMedical Image AnalysisDeep Learning
A
Amar Kumar
Center for Intelligent Machines, McGill University, Montreal, QC, Canada; MILA - Quebec AI institute, Montreal, QC, Canada
Tal Arbel
Tal Arbel
Professor of Electrical & Computer Engineering, McGill University
Computer VisionMedical Imaging
Li Fei-Fei
Li Fei-Fei
Professor of Computer Science, Stanford University
Artificial IntelligenceMachine LearningComputer VisionNeuroscience
Ehsan Adeli
Ehsan Adeli
Stanford University
Computer VisionComputational NeurosciencePrecision HealthcareAmbient Intelligence